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Abstract 

Automatic thresholding has been widely used in the 
domain of image processing for automated segmentation 
images. The Otsu’s method is a commonly used 
thresholding technique. It provides satisfactory results for 
thresholding images with bimodal distribution histogram, 
but it is impractical when extended to multilevel 
thresholding. In this paper, a new automatic thresholding 
method for colour image segmentation called the TSMO 
(Two-Stage Multi-level Thresholding) is presented. We 
revised the Otsu method for selecting optimal threshold 
values for both unimodal and bimodal distributions, and 
tested the performance of the revised method, on the 
colour images segmentation. This algorithm is iterative 
and outperforms Otsu’s method by greatly reducing the 
iterations required for computing the between-class 
variance in an image. The experimental results for 
synthetic and biomedical colour images demonstrate the 
success of the proposed method, compared to many other 
methods. 

Keywords: Thresholding, Otsu ’s method, two stage multi- 
thresholding, colour image segmentation, space colour. 

1. Introduction 

The problem of image segmentation has received 
considerable attention in the literature [1], [2], [3]. It is 
one of the most important tasks which determine the 
quality of the final result of image analysis. In this 
framework, colour image segmentation has wide 
applications in many areas [4] [5], and several 

methodologies have been proposed to tackle this problem 
[6] [7] [8]. 

In their recent work, Cheng et al. [9] show that the color 
image segmentation can be considered as a pixel labeling 
process in the sense that all pixels belonging to the same 
homogeneous region are assigned to the same label while 
according to similar color. In color image segmentation, 
the pixel color is given as three values corresponding to 
the three component images R (Red), G (Green) and B 
(Blue). Many papers dealing with segmentation using 
different kinds of color spaces have been developed by 



several authors [9] [10], such as RGB space or their 
transformations (linear/non linear). Each color 
representation has its advantages and disadvantages. 
Nonlinear color transformations such as HSI and 
normalized color space have essential singularities which 
are non-removable, and there are spurious modes in the 
distribution of values resulting from nonlinear 
transformations. RGB is suitable for color display, but 
not good for color scene segmentation and analysis 
because of the high correlation among the R, G, and B 
components [11] [12]. 

Hence, linear spaces are very difficult to discriminate 
highlights, shadows and shading in color images. Using 
HSI can solve this problem to some extent except that 
hue is unstable at low saturation [13]. In fact, the main 
problem of the color image segmentation is the choice of 
the adapted color model for a specific application. 
Thresholding is widely used in many image processing 
applications such as (1) medical image applications [14]; 
(2) automatic visual inspection of defects [15] [16]; (3) 
optical character recognition [17], and (4) detection of 
video change [18] [19]. Thresholding is the simplest 
technique which involves the basic assumption that the 
objects and backgrounds in an image have distinct gray- 
level distributions. Many global thresholding techniques 
have been developed over the past years to segment 
images and recognize patterns [20] [21]. 

Otsu’s method, as proposed in [20], is based on 
discriminant analysis. The threshold value is selected by 
maximizing the separability of the classes in gray levels. 
This method provides satisfactory results when 
thresholding an image with a histogram of bimodal 
distribution, but it presents several challenging 
difficulties for multilevel thresholding selection in 
images. Otsu’s method can be easily extended to a multi- 
level threshold problem, but it is inefficient in 
determining the optimal thresholds, especially if there is 
a large class number M required in an image. To improve 
the efficiency of Otsu’s method, many methods have 
been proposed. For example, Liao et al. [22] presented a 
modified version of Otsu’s method called ‘‘recursive 
Otsu method” to find the threshold values by accessing 
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the pre-computed modified between-class variance 
through a look-up table (i.e., H-table). Moreover, Fan and 
Lin [23] proposed a hybrid optimal estimation algorithm 
to deal with the multi-level thresholding problem. With 
the same objective, Quweider et al. [24] presented a new 
algorithm based on dynamic programming and the 
optimal partitioning of the image data space on an 
interval of gray levels, which is commonly used in single 
and multi-level thresholding problems. This method can 
reduce the number of gray levels from a fine to coarse 
fashion, and is shown to offer very good results when 
compared to many existing methods. 

To significantly improve the deficiencies in Otsu’s 
method with regard to selecting the multi-level threshold, 
Deng- Yuan Huang et al. [25] proposed an algorithm 
called the Two-Stage Multithreshold Otsu method 
(TSMO). The TSMO method outperforms Otsu’s method 
by greatly reducing the iterations required for computing 
the between-class variance in an image. 

In this paper an investigation of the automatic 
thresholding technique is described. We use the TSMO 
method for segmenting the color images. So, this work 
may be seen as an extended of the method proposed by 
Deng- Yuan Huang et al. [25] for the color images. In 
their paper, the authors presented a fast algorithm of the 
gray level image segmentation, which is a modified 
version of the conventional Otsu method, but it 
determines the multi-level threshold in a two- stage 
fashion. Hence, this paper is devoted to the determination 
of the multi-level threshold, applied to color images 
segmentation that contains more than two classes. The 
idea is based on the histogram thresholding of the Hue 
component in the HSI color space. The concept of the 
Two-stage multithreshold Otsu’s method (TSMO method) 
was discussed in [25], which is used to finding the 
optimal multi-level threshold in a two-stage fashion. 

The rest of this paper is structured as follows: Section (2) 
briefly describes the conventional Otsu’s method. Section 
(3) introduces the proposed method for color images 
segmentation. The experimental results are discussed in 
section (4), and the conclusion is given in section (5). 

2. Conventional Otsu method 

The Otsu’s method [20], is good for thresholding a 
histogram with bimodal or multimodal distribution. This 
method, however, fails if the histogram is unimodal or 
close to unimodal. 

Assume that an image can be represented in L gray levels 
(1, 2, . . . , L). The number of pixels at level i is denoted 
by f and N represents the total number of pixels in a 
given image. For a given gray-level image, the 
occurrence probability of gray level i is defined as: 

Pi = A; Pi >0; Y p i = 1 (1) 

i = 1 

In the case of single thresholding, the pixels of an image 
are divided into two classes, C 1 and C 2 , by a threshold 
at level t, where class C 1 consists of gray levels from 0 
to t, and class C 2 contains the other gray levels with t+1 
to L. 

The cumulative probabilities ( w 1 and w 2 ) of the two 
classes, respectively, are given by: 




t 
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ixn 

II 
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(2) 
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w 2 = X P ‘ 


(3) 


i = t + 1 




The mean levels ( // 1 
can be computed as: 


and p 2 ) f° r classes C j and C 2 , 


ti 


(4) 
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i=t+ i z 


(5) 



Otsu [20] selects an optimal threshold t * that maximizes 
the between-class variance in equation (6) where 
ju T is the mean intensity of the original whole image. 

t* = ar g l<Hi maX V l{t)\(T 2 B = 

MiU - f + vv' 2 (// 2 - Mr f } (6) 

and 

L 

MT=Yj ip i (7) 

i = 1 

The Otsu’s method [20], described here can be easily 
extended to image multilevel thresholding [22]. 

3. Proposed method 

The Otsu’s method [20] still remains one of the most 
referenced thresholding methods. This method selects 
threshold values that maximize the between-class 
variances of the histogram. It gives satisfactory results 
when the numbers of pixels in each class are close to 
each other. However, with an increase of the number of 
classes in an image, this method becomes rather 
inefficient because it requires a large number of 
iterations to compute the cumulative probability (zero th - 
order moment) and the mean (first-order moment) of a 
class. 

Hence, this paper is devoted to selecting the multi-level 
threshold, applied to color image segmentation, where, 
we aim at providing a help to the doctor for the follow- 
up of the diseases of the breast cancer. The objective is 
to rebuild each cell from the Hue component of the 
original image represented in the HSI color space. So, 
this work may be seen to be straightforwardly 
complementary to that in the paper proposed by Deng- 
Yuan Huang et al. [25], and the generalization of this 
method to color images segmentation. 

HSI system separates color information of an image from 
its intensity information. Color information is 
represented by hue and saturation values, while intensity, 
which describes the brightness of an image, is 
determined by the amount of the light. Hue is the most 
useful attribute in color segmentation since it is less 
influenced by the nonuniform illumination such as hade, 
shadow, or reflect lights [26]. Hue can be obtained by a 
nonlinear transformation from Red, Green and Blue 
color features [26] : 
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Hue = arctan( 



(R - G) + (R - B) 



( 8 ) 



Hue represents basic colors, and is determined by the 
dominant wavelength in the spectral distribution of light 
wavelengths. It reflects the predominant color of an 
object and has a great capability in subjective color 
perception [26]. 

To significantly improve the deficiencies in Otsu’s 
method with regard to selecting the multi-level threshold, 
we use an algorithm called the Two-stage multithreshold 
Otsu’s method (TSMO method). 

A general concept of the TSMO method is given in Ref. 
[25]. The idea of this method is quite simple and 
straightforward: to greatly reduce the iterations required 
for calculating the zero th and first-order moments of a 
class. In the first stage of the TSMO method, the 
histogram of an image with L gray levels is divided into 
M z groups which contain N z gray levels. 

Let Q denote the groups of the total image space; then 
Cl = {q j | j = 0,1,... , M z - 1 } , where j represents the 

group number. Hence, each group contains a certain 
range of gray levels as follows: Cl 0 contains a range of 

gray levels {o ,1 N z - l} , Cl x with gray levels 
{N Z ,N Z + 1,..., 2 N z -l},..., Q q with gray levels 
{qN z ,qN z + 1,..., (q + l)N z - 1 },..., and the last 
group Cl M _ x with gray levels 

{(M z - 1 )N Z ,(M z - 1 )N Z + 1,...., M Z N Z - l}. 

The number of cumulative pixels (the zeroth- order 
cumulative moment), in the q th group denoted by 



gCl q can be calculated as: 




G? + 1 > N z - 1 




£ ft q ~ S f 1 


(9) 


i = q x N z 





where represents the number of pixels with gray level 
i . Since each group contains N z gray levels, the 
corresponding gray level value for each group can be 
considered as a mean value for those N z gray levels. 
Therefore, the corresponding gray level value or mean 
intensity (the first-order cumulative moment), in the q th 
group denoted by i Qq can be calculated as: 

(^ + 1>V,-1 /(q + l)xN Z -1 

= Z if ‘/ Z fi 

i = qxN z / i= qx N „ 

( 10 ) 



(«+l>w 2 -l 

= - — Z v, 

6 o q i=qxN z 

Hence, Otsu’s method can be applied to find the optimal 
threshold j* by maximizing the between-class variance 
(o-j) with the sets of i Q and g Q The optimal threshold 

j which is also regarded as the number of the group 
into which the maximum variance of the between-class, 
i.e., falls with the corresponding group Cl .* is 



defined as: 



/ = arg max 0 < j<m z -i {cr|0‘)j (ID 

If an image can be divided into two classes, C x and 
C 2 , by Q * , where class C x consists the group from 
Cl 0 to Cl .* , and class C 2 contains the other groups 
with Cl * +1 to Cl M _! , then the numbers of the 

cumulative pixels and the means for the two classes, 
respectively, are given by: 



"*(/-)= Z*v 


(12) 


7=0 








W 2 (/)= Z^' 


(13) 


7=/+ 1 




and 
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M= — 


(14) 
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S 2 

Ri= — 


(15) 



w 2 

where S x and S 2 are the first-order cumulative 
moments for classes C x and C 2 , respectively 



j 



Z^n, 


(16) 


7=0 




M z -\ 






(17) 


7=/+l 





Thus, the maximum variance of the between-class 

( g B ) max , can be easily found using the modified version 
of Otsu’s method proposed by Liao et al. [22]. In the 
case of bi-level thresholding (m = 2) , the maximum 
variance of the between-class is defined as: 

Vb fax = Z Wk ^ = + W 

k = 1 

= sl + sl ( 18 ) 

VV’i w 2 

_ ^i 2 t (jV ~ ) 2 

w x N - w x 



where M is the class number in an image, S T is the sum 
of S x and , and N is the total number of pixels in an 
image. That is 



Sj — iS*! + s 2 

N = w x + w 2 



(19) 



In the second stage of the TSMO method, since Cl .* 



contains the gray levels with (j )N Z to (j +1)A Z -1 in 

which (cr g )^ ax (/) occurs has already been found in the 
first stage, Otsu’s method can be applied again to group 
Cl .* in a similar fashion to found the optimal threshold 



t . Hence 
t* = ar g, 



U )N z <t<(j +l)N z -l 



max 



{( ct b ) 2 ( 0 } 



( 20 ) 
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Figure 1 . Flowchart of the proposed method 

In fact, the first phase of the proposed approach is the 
determination of the color feature hue. In the second 
phase, the TSMO method is used to cluster the feature 
Hue into several classes with every class corresponding 
to one region in the segmented image. 

Therefore, the pixels are divided into several groups with 
each group having similar colors. The proposed method 
can be described by a flowchart given in Figure 1 . 

4. Experimental results 

The described method has been extensively tested on 
color cells images, where, we aim at providing a help to 
the doctor for the breast cancer follow-up diseases. Some 
experimental results are shown in Figures 4-7. The 
images are originally stored in RGB format. Each of the 
primitive colors (red, green and blue) is represented by 8 
bits and has an intensity range from 0 to 255. We applied 
the proposed method to both RGB and HSI color spaces 
and compared the results with those obtained by using 
the Otsu’s method and Fuzzy approaches. 




(a) (b) 

Figure 2. Synthetic image (a) and real medical cells image (b) 




(C) (d) 



Figure 3. Segmentation results on a color image, (a) Original image 
(256x256x3) with gray level spread on the range [0,255]. (b) Red 
resulting image by TSMO-32 method, (c) Green resulting image by 
TSMO-32 method, (d) Blue resulting image by TSMO-32 method. The 
selected thresholds are 201, 116, and 122 respectively. 

The two images shown in Figure 2 were used in order to 
illustrate the stages of the segmentation algorithm and 
visually assess the quality of the segmentation results. 
The synthetic image (Figure 2(a)) contains 5 areas and 
can be considered as piecewise constant in most of its 
areas, the background intensity level is 70, 28 an 39 for 
the red, green and blue components, respectively. Figure 
2(b) shows a real medical cells image, where the noise 
statistically tested and found to be approximately 
additive Gaussian. 

Figure 3 shows the segmentation result based on the 
described algorithm of the image Figure 2(b) represented 
in the RGB color space applied to the red, green and blue 
color features, respectively. In this case, a region is 
recognized in green component but is not identified by 
red and blue components. This shows that the RGB color 
space has a strong correlation of its three components, 
and hence, the use of a single information source leads to 
bad results. 

Comparing the results, we can find that the cells are 
much better segmented in Figure 3(c) than those in 
Figure 3(b) and Figure 3(d). Also, the first resulting 
image contains some holes in the cells, which do not 
exist in another resulting image. This demonstrates the 
necessity to choose the color model and the segmentation 
algorithm adapted for a specific application. 

For purpose of comparison, we apply the proposed 
approach and some Fuzzy and automatic thresholding 
approaches to the same color image segmentation, using 
HSI color space. The latter methods include those of N. 
Otsu [20], Liao et al. [22], Lim et al. [27] and Liew et al. 
[28]. 

The algorithm proposed by Lim et al. [27] called 
(TFCM), is based on the Thresholding and the Fuzzy C- 
means (FCM) techniques. 
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(d) (e) (f) 



Figure 4. Comparison of the proposed segmentation method with other existing methods on a medical image (2 classes, various cells), (a) original 
image (256 x256 x3): color medical image with RGB description, (b) Hue resulting image by the HCM algorithm, (c) Hue resulting image by the 
FCM algorithm, (d) Hue resulting image by the TFCM algorithm, (e) Hue resulting image by the MFCM algorithm, (f) Hue resulting image by the 
proposed TSMO method. 



Table 1. Segmentation sensitivity From HCM, FCM, TFCM, MFCM 
and TSMO Methods for the Data set Shown in Figure 5 





HCM 


FCM 


TFCM 


MFCM 


TSMO 


Image 1 


63.61 


Sensitivity segmentation (%) 
74.75 76.82 78.45 


87.96 


Image 2 


62.26 


73.16 


75.18 


76.79 


86.09 


Image 3 


65.23 


74.30 


76.34 


77.98 


87.43 


Image 4 


63.08 


74.12 


76.17 


77.79 


87.22 


Image 5 


63.26 


74.33 


76.39 


78.02 


87.47 


Image 6 


83.67 


87.99 


89.23 


89.35 


89.38 


Image 7 


66.50 


73.44 


75.47 


77.08 


86.42 


Image 8 


69.11 


74.15 


76.20 


77.83 


87.26 


Image 9 


67.22 


78.99 


81.17 


82.90 


92.95 


Image 10 


71.73 


84.28 


86.62 


88.46 


99.18 


Image 11 


72.12 


84.74 


87.08 


88.94 


99.71 


Image 12 


71.66 


82.97 


91.83 


93.78 


95.32 



The methodology uses a coarse-fine concept to reduce 
the computational burden required for the FCM 
algorithm. Furthermore, to greatly improve the efficiency 
of FCM method, Liew et al. [28] proposed a new 
algorithm called the MFCM method. In fact, the MFCM 
method outperforms FCM method by using the local 
spatial information. This method improve the 
classification results and provides satisfactory 
segmentation results. 

However, the segmentation results obtained by the 
conventional Otsu method, the fuzzy algorithms and the 
proposed TSMO method as described earlier, are shown 
in Figures 4 and 5. 

Figure 4 shows the segmentation results obtained by the 
HCM algorithm [29], the FCM algorithm [5], the TFCM 
method [27], the MFCM method [28] and the proposed 




Figure 5. Data set used in the experiment. Twelve images were selected 
for a comparison study. The patterns are numbered from 1 to 12, 
starting at the upper left-hand corner. 



TSMO method applied to the color feature Hue of the 
Figure 2(b). They correspond, respectively, to Figure 4(b), 
Figure 4(c), Figure 4(d), Figure 4(e) and Figure 4(f). 
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(C) (d) 



Figure 6. (a) Original image 256x256x3 color synthetic image disturbed 
with a "salt and pepper" noise and with gray level zero to 255 of each 
primitive colors and 6 classes, (b) Resulting image by HCM method, (c) 
Resulting image by FCM method, (d) Resulting image by the proposed 
method. 

We notice that while using the proposed method, the 
regions are exactly and homogeneously segmented (see 
Figure. 4(f)). 

To evaluate the performance of the proposed 
segmentation algorithm, its accuracy was recorded. 
Regarding the accuracy, Table 1 lists the segmentation 
sensitivity of the different methods for the data set used 
in the experiment. 

The segmentation sensitivity [30] [31] (Sens %) is 
computed using 

Sens = NPcc xlOO (21) 

NxM 

where: Sens, Np cc and NxM correspond to the 
segmentation sensitivity (%), number of correctly 
classified pixels and the image sizes, respectively. 

To provide insights into the proposed method, we have 
compared the performance of the proposed method with 
those of the corresponding Hard [29] and Fuzzy C-Means 
[5] algorithms. The method was also tested on synthetic 
images and compared with other existing methods. 

The comparison of the proposed approach will be 
presented through the next experiment. Figure 6(b), (c) 
and (d) show the final segmentation results obtained from 
the HCM algorithm, the FCM and the TSMO-32 
algorithms, respectively, when a ‘’salt and pepper” noise 
of D density is added to the original image 7, shown in 
Figure 6(a). This affects approximately (D x (N xM)) 
pixels. The value of D is 0.02. 

Since HCM and FCM-based methods do not consider the 
spatial dependencies among the pixels and the 
compatibilities of the points belonging to the classes to 
compute the membership’s degree, there are some 
isolated pixels that are remaining. As seen in the 




experimental results, the proposed approach works better 
than the HCM and FCM-based methods. 

Comparing Figures 6(b), 6(c), and 6(d), we observe that 
the five regions are correctly segmented in Figure 6(d), 
showing the good selection of the optimal thresholds by 
TSMO-32 method. It can be seen from Table 1 that 
28.44% and 17.03% of the pixels were incorrectly 
segmented by HCM and FCM based methods, 
respectively, but only 04.68% are incorrectly segmented 
pixels by our proposed method. Comparing Figure 6(b) 
and 6(c) with 6(d), we can see that the image resulting 
from the proposed method is much clearer than the one 
resulting from the HCM and FCM based methods. 

Also, for comparison, we test the performance of the 
proposed method in RGB and HSI color spaces by 
applying TSMO-32 to red, green, blue and hue color 
features, respectively (see Figure 3). The major problem 
that RGB color space suffers is a high correlation of the 
three color features. Also, the segmentation results 
obtained by using the color feature RGB is performed on 
three dimensions, therefore it needs a triple processing 
time. In fact, using HSI color space can solve this 
problem. The color feature Hue can represent subjective 
color suitable for human observes. The result does not 
have correlation and the segmentation is performed on 
only one dimension, using less processing time. 

This could be observed in our experimental results. In 
Figures 3(b), 3(c) and 3(d), the different regions are 
misclassified during the segmentation using RGB space. 
Therefore, the same regions are segmented correctly 
using hue feature (see Figure 4(f)). This proves that hue 
is less influenced by shadow and highlight in an image. 
Consequently, the Otsu method requires a lot of 
computation time, when extended to a multi-level 
threshold problem due to the fact that a large number of 
iterations are required for computing the cumulative 
probability and the mean of a class. In the present study, 
the TSMO-32 method [25] is used to greatly reduce the 
iterations required for computing the between-class 
variance in an image, and to determine the optimal 
thresholds. In fact, for a bi-level threshold (M=2) of an 
image, the iterations for the conventional Otsu’s method 
[20], the TFCM method [27], and the TSMO-32 are 253, 
167 and 36, respectively. Although, for (M=5) of an 
image, the iterations for the conventional Otsu’s method 
[20], the TFCM method [27], and the TSMO-32 [25] are 

250 4 , 225 4 and 26 4 , respectively. Hence, the 
iterations of the conventional Otsu’s method [20] and the 
TFCM [27] are high compared to the iterations required 
by the TSMO-32 method. 

5. Conclusion 

In this paper, we have proposed a new method to color 
image segmentation based on multi-level thresholding 
technique. We extend the general idea of Otsu’s method 
to multi-level thresholding in a medical color images. In 
the first phase of the segmentation, the histogram of an 
image is divided into groups which contain the same 
number of gray levels. In the second stage, the separate 
search range of Otsu’s method should fall into the groups 
found in stage one. Hence, the optimal threshold for each 
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group can be found by the new fast algorithm called the 
TSMO method (Two-Stage Multi-level Otsu method). 
Experimental results are provided to demonstrate the 
effectiveness of the proposed method, and then a 
comparative study versus existing techniques is presented. 
The advantages of different color spaces, HSI and RGB, 
are also given. The color feature hue is proved to be more 
efficient than RGB color features by this research. The 
proposed method will find wide applications in image 
thresholding, segmentation, texture analysis, etc. 
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Abstract 

In this paper, we propose a method, which requires no 
interactive operation, to segment human object from 
multi- view video. Our method consist of two stages: for 
initial frame of the video sequence, we automatically 
extract object based on saliency model and iterated Graph 
cut. After having segmented object in first frame, we 
propose the algorithm combining Bayesian estimation 
and minimizing energy function using graph cut to 
segment object. In our energy function, the color, depth 
and spatial-temporal coherence are integrated in data 
term. Smooth term is encoded the penalty cost of the 
neighboring pixels with different labels. By combining 
Bayesian estimation, minimizing energy functions via 
graph cut is speeded-up. Experiment results on test 
sequences are encouraging. 

Keywords: Multi-view s/Stereo Object segmentation, 

Automatic Object Segmentation, Object/Foreground 
extraction 

1. Introduction 

Segmentation is one of the most common and active 
research topic in image processing image processing, 
computer vision for recently decades. Image/video 
segmentation can be classified into region-based 
segmentation and object-based segmentation. From the 
perspective of this study, video object segmentation can 
be classified into two main categories: segmentation in 
mono view and multi- view video. 

Most of interest has been focused on the research of 
mono-view segmentation leading to many advanced 
algorithms, theories and technologies. Especially object- 
based segmentation has drawn great attention from the 
research and industrial community, resulting in many 
commercial products with image cutout tools or user 
interface, such as Video SnapCut [1], Lazy Snapping [2], 
Ratio cut [3] and GrabCut [4]. These interactive object 
segmentation tools requires user intervention to provide 
foreground/background hints by brush stokes or 
bounding box, which can obtain high accuracy object 
regions. 

On the other hand, multi-view segmentation had not 
attracted much attention due to the limitation of capture 
technology. With the recent growing capability of 



capturing devices, multi- view capture system with dense 
or sparse camera array can be built with ease, which 
motivates the development of multi- view techniques and 
its related applications. One of challenging task but 
playing very important roles in many in multi-view 
applications (image-based rendering, 3D reconstruction...) 
is object segmentation. Based on the different 
methodologies involved, the existing algorithms of object 
detection and extraction from multi- view images can be 
grouped into three categories: background modeling, 
visual hull-based and depth-based. 

In this paper, we focus on depth-based multi- view object 
segmentation. Depth information recovered from multi- 
view image/video serves as an important cue for our 
segmentation algorithm. However, due to ill-posed nature 
of depth estimation, errors may occur in the depth map. 
To obtain more robust segmentation results for object- 
level manipulation, integration of depth, color, and other 
image cues should be considered. Depth reconstruction 
and multi-view segmentation is generally addressed in 
the sequential, joint or iterative fashion in a number of 
literatures. In [5], author described models and 
algorithms for bi-layer segmentation of stereoscopic 
frames. Stereo disparity is obtained by dynamic 
programming in Layered Dynamic Programming 
algorithm, and stereo match likelihood is then 
probabilistically fused with contrast-sensitive color 
model to segment each frame by ternary graph cut. In [6], 
multi-view scene reconstruction and segmentation are 
dealt with by joint graph-cut optimization for the 
challenging outdoor environments with moving cameras, 
for example, rugby and soccer scenes. Segmentation and 
depth labeling field are formulated into the unified 
energy function, which involves color and contrast term 
for segmentation, as well as the match and smoothness 
term for reconstruction. In [7], authors proposed a 
flexible and homogenous approach to simultaneous depth 
estimation and background subtraction in a multi-view 
setting, assisted by a static background image with 
known depth for each camera. The results of depth 
reconstruction and background separations algorithm is 
obtained as minimization of energy functional, to 
generation a dense depth map and foreground map. In [8], 
the estimated depth map and segmentation mask are 
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iteratively computed using an Expectation-Maximization 
(EM) algorithm. 

In this paper, we approach in straightforward way. First, 
depth is estimated and then the depth information is 
incorporated into the segmentation framework. This 
approach is reasonable because depth maps are becoming 
a readily available commodity of the multi- view pipeline. 
The purpose of this research is to fuse color, contrast 
with depth to robust object segmentation. The 
contribution point of our research is automatically 
created trimap by Bayesian estimation. Created trimap is 
initial value to speed-up graph cut optimization algorithm. 
The remainder of the paper is organized as follows: 
Section (2) briefly introduces background of graph cut 
theory. Section (3) emphasizes on our proposal 
segmentation algorithm. Experiments and conclusion are 
presented in section (4) and (5) respectively. 

2. Background of graph cut theory 

Before going to further, we briefly explain the graph cut 
theory. 

Graph cut based methods construct a graph topology to 
minimize the specified energy function activated by the 
max-flow/min-cut algorithm [8], so that the min-cut on 
the graph is of minimal energy among all the cuts 
separating the terminals. 

Theoretically, G-\V,E} is a directed graph with 
weighted edge capability. V is the vertices set including 
all the nodes as well as another two special terminals 
usually called sources and sinkt . Generally, nodes 
represent pixels (or voxels), and source and sink 
correspond to the labels assigned to the nodes. E is a 
directed edge set, which connects all the vertices in the 
graph. It contains two types of edges in the graph named 
t-link and n-link . t-link connects pixels with 
terminals or labels, n-link joins the pixels with their 
neighbors. All the edges in the graph are assigned 
weights or costs representing the edge capability. A 
S - T cut C with two terminals is a partition of nodes in 
the graph into two disjoint sets S and T so that 
s gS and t g T . 

In combinatorial optimization, the cost of a cut is 
calculated as the sum of the edge weights passing from 
S to T . The min - cut problem is to find a cut with the 
minimal cost. In [10], they implemented a fast algorithm 
for graph-cut optimization with public source code, 
which drives the extensive application of graph cut 
technique in various optimization problems. 

The general formulation of the energy function is defined 
as follows: 

£(/) = ^£ p (/ p ) Uj;£ M (/ f ,/ 4 ) (1) 

psP (p,q)^N 

where, / is the labeling field, P is the set of pixels, and 
A is the pixel’s neighborhood system. E p (f p ) is called 

the data term and corresponds to the t - link in the graph. 
It measures the likelihood of a certain pixel p assigned to 

the label f p and how well the label f p fits the pixel p 

given the observed data. The smoothness E p q (f p ,f q ) is 

represented by the n - link in the graph. It evaluates the 



penalty of disagreement between p and q which are 
assigned with f p and f q respectively, A is a parameter 
to weigh the importance of these two terms. 

3. Proposal segmentation algorithm 
3.1. Algorithm overview 

Figure 1 shows the entire algorithm. Our algorithm takes 
the color image of key view, depth image which is 
estimated from the multi- views as the inputs, and the 
segmentation result of the foreground region as the 
outputs. 




Figure 1 . The illustration of proposal algorithm 
In our framework, depth is estimated based on algorithms 
in [11], detail in section 3.2. But in many cases, depths 
are free given with multi- view colors images. 

For the starting step, if the input is first frames in 
sequences, we apply the novel method to segment object 
based Saliency model [16] and GrabCut [4], given detail 
in section 3.3. From the second frame, we propose a 
probabilistic model combining Bayesian estimation and 
graph cut algorithm, discussion in section 3.4. 

3.2. Depth estimation from multi-view images 

Depth estimation aims at calculating the structure and 
depth of objects in a scene from a set of multiple views or 
images. This topic has been attracted extensive attentions 
in research communities. A comprehensive survey and 
evaluation of dense two-view stereo matching algorithms 
can be found in [12]. For multi-view scene reconstruction, 
one of the top advance algorithms is using graph cut and 
is described detail in series of papers [10],[11],[12]. 

In this work, depth is estimated based on algorithms 
proposal in [11]. In [11], the data term enforces photo- 
consistency. Fet I be a set of pair of “nearby” 3D points. 
These points will come from different view, but they will 
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share the same depth (i.e., the points are of the form 
(p, f P } . (<?> f q ) where f p = f q and p , q are pixels 



from different views). Then the data term is: 

Y J D(p,q).T[f p =f q =l] (2) 

{((p,l),(q,l)el)} 

The smooth term, on the other hand, involves a single 
camera at a time. It is defined to be: 

(/,•/«) (3) 

{p,qhN 

where, N is a neighborhood system on pixels in a single 
view. 

The energy function in [11] is minimizing by expansion 
move algorithm [13], which is efficient approximation 
graph cut algorithms. Figure 2 shows on example of our 
depth estimation results. 




Left image Right image Depth image 



Figure 2. Stereo depth image 

Depth map provides vital information for scene 
interpretation; therefore, depth maps are becoming a 
readily available commodity of the multi- view pipeline. 
We can make use of this new free information to our 
algorithm. 

3.3. Saliency cut in the first frame 

Most of graph cut based object segmentation algorithm 
need user’s intervention to specify the initial foreground 
and background regions as the hard constraints such as 
in [2], [4] and [15]. User’ interaction is helpful to 
obtained good segmentation results, but the initialization 
itself may be annoying the user especially when much 
guidance is needed. 

In our proposal algorithm, the object will be 
automatically segmented but requiring only the object 
mask in the first frame in video sequence. To obtain the 
first frame object mask, we have two ways: manually or 
automatically locating and extracting object. For 
manually extracting objects, user can use some tools such 
as Lazy Snapping [2], GrabCut [4], or using simple 
background subtraction method with having background 
of the first frame. 

In this section, our purpose is automatically locating and 
extracting object in the first frame. Visual attention 
concept gives us with smart mechanism to perceptually 
attract human’s attention toward the location of 
interesting objects in a complicated scene. Saliency 
model in [16] is one of the earliest works. Give a static 
image, this model employs color, intensity and 
orientation to compute Saliency Map (SM), which 
encodes the obviousness at each location in the visual 
input. Until now, there are many saliency models have 
proposed such as in [17], [18], [19], [20]. 

In this paper, we apply the ideal given in [17] and [18] to 
compute the saliency map. The process is demonstrated 
in Figure 3. 




Figure 3. Object segment based on SM and GrabCut 



After having saliency map, we consider the use of this 
map to assist in salient object segmentation. Saliency 
maps have been previously employed for unsupervised 
object segmentation such as in [20], [21], [22]. 

In our approach, we iteratively apply GrabCut [4] to 
refine the segmentation result initially obtained by 
threshold, dilation and erosion operating and connected 
component labeling (CCL) on the saliency map (see 
Figure 3). Instead of manually selecting a rectangular 
region to initialize the process, as in classical GrabCut, 
we automatically define the bounding region based on the 
result of threshold and CCL on SM. 

Once initialized, we iteratively run GrabCut to improve 
the saliency cut result. We apply dilation and erosion 
operations on the current segmentation result to get a new 
“seed” for the next GrabCut iteration. Our experiments 
need 3 or 4 iterations to obtain good result. 

Our approach can be the semi-automatic object segment 
by given some guidance to GrabCut as in classical one 
when saliency model have failed to locate the desire 
object. 



3.4. Probabilistic model combining Bayesian 
estimation and graph cut 
3.4.1. Color and depth data models 

This session, we briefly explain the color and depth 
probability model using in our algorithm. 

Color data model: 

Gaussian Mixture Model (GMM) in RGB color space is 
used for color data model. We use two GMM models 
with K component, one for background and one for 
foreground. The likelihood of pixel p with color 

C p (r,g,b) (r,g,b: color values) belongs to the 

foreground or background can be written as: 

K 

p (C p \f) (4) 

k = 1 

where, / e \F , B\ representing foreground and 
background; w k j is a mixture weighting coefficient; and 

G(C p ;//y it ;Z/,t) is the k‘ h Gaussian component as: 

G(C p -, Mf ' k ,Z f , k ) 

1 (5) 

= 77Y ex P( _ (Cp -Pf, k ) T (X f , k )~\C p - fi f k )) 

(2*) 3/2 |Z /i4 | 



Given the GMM model 

i= ={F,B),k = (i.e. the 

weight w , means ju , covariance Z , and 2 K Gaussian 
component for background and foreground), we can 
calculate the likelihood P(C p \f) with / = {f,Z?} by 

using Equation (4). 
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Depth data model: 

Depth image is an array of grey values. Here we use 
histogram of grey values for depth data model h(p;f). 
The same as color model, we need two histograms, one 
for foreground and another for background. Histograms 
are normalized to sum to 1 over grey level range 

h(p; f) = land we get likelihood of pixel p with depth 

J P 

value D p as: 

P(D p \f) = h(p;f) with f = [F , B] (6 ) 



3.4.2. Bayesian estimation and trimap creation 

This section computes the probability of each pixel to be 
in foreground base on Bayesian estimation and the results 
are used to create the trimap, which is used for 
segmentation object via graph cut. 

Lets Cp , D l p are color and depth value of pixel p on 

color and depth images at time t . The probability of pixel 
p belongs to foreground is calculated based on Bayes’ 
formula as: 

P t (F\C p ,D p ) (7) 

P , (C p .D p \F)P , (F) 

” P‘ (C p -D p I F)P‘ ( F) + P t {C p .D p I B)P* (B) 
where, P(x) is probability of x , F and B stand for 
Foreground and Background in which pixel p belongs to. 

We assume that color and depth are independent, so the 
likelihood: 

P t (C p ,D p l F) =P t (CplF)P t (D p IF) 

( 8 ) 

P t (Cp,D p IB ) = P t (C p l B) P l (Dpl B) 

The likelihood P t (C p I F) and P f (C p I B) are calculated 

from foreground and background Gaussian Mixed Model 
(GMM) which are constructed from the previous segment 
result of color frame at {t - 1) . 

Similarly, the likelihood P^Dp IF) and P t (D p IF) are 

calculated from grey-level histogram which also 
constructed from previous segment result of depth image 
at(f-l). 

Because of successive frames in the temporal domain 
would have strong correlations, so the prior 

probability P*(F) and P ? (P) of frame at rare calculated 
from the previous image frame at (t- 1) . In order to get 

more accurate result, P f (F) and F ? (F) can inferred 
from smooth map the 2D mask of segmentation results 
in previous frame at ( t - 1) by performing Gaussian filter. 
Based on computed prior probability and likelihood 
probability, the posterior probability P l (F I C p , D p ) and 

P f (B I C p , D p ) are calculated by Equation (7). 

Applying this process for whole image pixel p , we get 
the probability image I prob (p) . Based on probability 



value of pixel p , I prob (p) the trimap T { F : Foreground; 
B : Background; U : Uncertainly} can be created by: 



Inrot, (p) ^ threshold 



prob 

Iprob (p) > 1 - threshold 



( 9 ) 



T(p) = B 
T(p) = F 

otherwise F(p) = U 

Here, threshold can be very small real value, in our 
experiment we choose threshold = 0.005. To remove the 
noise in trimap T , a filter to applied to foreground region 
and background regions. 




( 10 ) 



Figure 4. Trimap (back color: background; white color: foreground; 
grey color: uncertainly) 

3.4.3. Object segmentation by graph cut 

In this part, we focus on setting the data term and smooth 
term for our algorithm. 

Data term E p (f p ) \ measures the agreement between the 

segmentation labels and observer data. We define this 
term as the weighted sum in following equation: 

E p (f P Y = «X{-log(P(C p lf p )) -log(P(D p lf p )) 

peP 

+ {\-a)'Y J -\og{I prob {pj) 

peP 

where, the likelihood P(C p I f p ) and P(D p I f p ) can be 

obtained from GMM for color cue and histogram for 
depth cues as describing in section 3.4.1. Different from 
others previous related works, we add new 
term I b (p) as in the second row of Equation (10). This 

term encodes spatial temporal coherence to improve the 
segmentation result and can be reused the probability 
image in the creating trimap step. 

Smooth term E pq (f p ,f q ) : measures the penalty of two 
neighboring pixel p and q with different labels and is 
defined as follow: 

E p,q(fp’f q ) = 

^?./ [4,/ ' I p^b expM|c '- c ’ l|) (11) 



where, 

r X-h <12) 

||<Fs 7 ?(/?,g)|| and |c p -C^|| are Euclidean distance of 

neighboring pixels in coordinate and color space 
respectively. N is the set of pairs of neighboring pixels. 
In practical experiment, a good results are obtained by 
defining pixels to be neighbors in 8-way connectivity 
(horizontal, vertical and diagonally). In [15], they had 
shown that it is more effective to set J3 > 0 since this 
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relaxes the tendency to smoothness in region of high 
contrast. We choose ft the same in [15] as follow: 

p = ^C p -C q ) 2 )Y (13) 

This choice of ft ensures that the exponential term in 
Equation (11) switches appropriately between high and 
low contrast. Note that if (5 - 0 , the smooth term is well- 
known Ising model, which encourages smoothness 
everywhere [4]. 

In our algorithm, the automatically generated trimap 
T - \r F ,T b ,Tjj) (see section 3.4.2) is set as the initial 

values for graph cut. This is different point of our 
algorithms comparing to interactive segmentation method 
such as GrabCut [4], which requires considerable degree 
of user interaction for supplying trimap. With our trimap, 
Graph cut optimization only need perform in uncertainly 
region T v . However, Tjj is smaller region than whole 

image, so the time for graph cut optimization decreases 
considerably. 



4. Experimental Results 

To evaluate the performance of our method, we compare 
the segmentation results of our method with respect to 
the ground truth in the IU sequence, which can be freely 
downloaded from [24]. We define the Absolute Mean 
Error Rate (AMRE) of every fifth frame (in the left view) 
as the number of misclassified pixels over the total 
number of pixels in the image, which is the same 
measurement adopted in [5]. 

4 



AMRE = — ^ 



misclassified pixel w.r.t ground trust 
total number of pixels in images 



(14) 



We make a comparison of two cases: 1) using only 
color/contrast cues, 2) fusing both color and depth cues. 
The comparison shows in Figure 5. 




frame number 

— ■ — using color cue fusing both color and depth 



Figure 5. Quantitative comparison 

Using only color/contrast is a simply our algorithm in 
which the depth is switched off. As in Figure 5, when 
depth and color are fused we can get the better 
segmentation results. The segmentation in the case using 
both depth and color is more stable than only 
color/contrast case because absolute mean error rate is 
not much varying. 

Many frames in IU sequence, the background is non- 
stationary (there are other people moving in the 
background), but our algorithm is also working well on 
these frame, as demonstration in Figure 6. 




Frame 21 in IU sequence 




Frame 70 in IU sequence 

Figure 6. Segmentation works well with non- stationary background 



Besides using the IU sequence given in [24], we also set 
up the system to capture stereo video to show result of 
our algorithm. A common Minoru 3D webcam [25] is 
used to capture stereo video. Bouguet’s algorithm built in 
OpenCV [26] is used for camera calibration. 
Segmentation in our captured video also shows good 
results as in Figure 7. 




Figure 7. Segmentation on stereo video captured by Minoru 3D webcam 



We are not focusing on optimizing the running time, but 
we measure the time which Graph cut process is 
initialized by our trimap comparing with the case the 
graph cut algorithm is applied over whole image. Our 
proposal is faster, about 100 milliseconds for each frame. 
All our implementations use C++ and OpenCV [26]. 

5. Conclusion 

In this paper, we have introduced complete framework to 
automatically segment the human region in multi-view 
video. Depth is estimated from multi- view image by 
graph cut. Saliency model and iterated Graph cut are used 
to automatically locate and extract the interested object in 
the first frame to trigger the whole process. Depth is 
fused with color and spatial-temporal coherence in 
energy function. By combining Bayesian estimation, 
trimap is created automatically and used as initial value 
to speed-up minimizing energy functions via graph cut. 
Experiment results on various test sequences are 
encouraging. Our future work will be focused on 
improved this algorithm for multi-objects segmentation 
and efficient way to project the segmentation result in 
key- view to another views. 
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Abstract 

Retinal disorders are among the most dangerous diseases 
because many of them may lead to blindness if not early 
diagnosed and managed. Medical screening is very 
important as it allows the early diagnosis of the disease. 
Only ophthalmologists who have enough experience can 
differentiate between normal and abnormal retinas at the 
early stages of the disease. So, screening routines of the 
retina are very expensive and rarely to be done. In this 
work we provide a computer-aided screening system for 
the retinal disorders. The system could be used easily by 
the young physicians as it can automatically detect the 
early symptoms of abnormalities such as microaneurysms, 
hemorrhage and exudates. Using newly proposed image 
processing algorithms, we implemented a screening 
method based on the experience of the ophthalmology 
experts. 1 

Keywords: Computer-aided diagnosis, hemorrhage, 

microaneurysm detection. 

1. Introduction 

Blindness is an intimidating problem. Many life 
difficulties and risks could be caused by blindness. 
According to the World Health Organization (WHO), 
blindness is defined as a corrected visual acuity below 
3/60 on the Snellen scale for the best eye, or a central 
visual field diameter of less than 10°. It is estimated that 
about 180 million people worldwide have a visual 
impairment. Of these, 45 million persons are blind and 
135 million are partially sighted. These figures are 
expected to double over the next 25 years due to 
combination of an increasing population and aging 
worldwide. The most common causes of blindness are 
cataract, glaucoma, and retinal disorders such as age- 
related macular degeneration (AMD) and diabetic 
retinopathy (DR) [1]. While cataract and glaucoma can 
be treated surgically, there is no curative treatment for the 



1 This study has been implemented on Matlab platform 
at Biomedical Engineering lab. NILES - Cairo university. 



macular degeneration and the retinal vascular diseases 
till now. So, AMD and retinal vascular diseases, 
specially diabetic retinopathy, are the major causes of 
untreatable blindness worldwide. To prevent blindness 
caused by retinal disorders, they should be early 
diagnosed and then the spread of the disease can be 
stopped. So, screening of the retina is of most importance 
in this aspect. Unfortunately, screening of the retina is 
costly and should be achieved by highly qualified 
ophthalmologists. This motivated us to design and 
implement a computer-aided screening system that can 
automatically detect the abnormal symptoms on the 
retina in their early stages. This system could be used by 
young physician because it depends only on capturing 
digital images of the retina. These images can be 
processed by the system to decide whether the patient 
has the markers of retinal disorders, that might lead to 
blindness if left untreated, or not. The system is assumed 
to be able to determine the severity of the disease and the 
urgency for management. 

Since AMD and (DR) are the major causes of blindness, 
we provide robust methods for the early detection of 
such diseases. In their early stages, these diseases are 
characterized by the presence of microaneurysms, 
hemorrhage and exudates in the retina. While exudates 
appear as yellow irregular regions in the fundus 
photgraph, microaneurysms and hemorrhage appear as 
red spots with irregular boundaries for the later as shown 
in figure 1. According to how much these symptoms 
appear on the retina, the severity of the disease could be 
estimated. In a previous work, we published a new 
method for the detection of blood vessels and exudates in 
retinal images using morphological image processing 
techniques [2]. 

In this work we propose some methods for the detection 
of hemorrhage and microaneurysms together with the 
presentation of the screening system. The aim was 
originally to build a screening system that can be used 
easily for quick assessment of retinal diseases. Screening 
systems are very useful tools in medicine as they save 
time and efforts of medical experts. Young doctors 
always use screening systems for initial diagnosis and 
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deciding whether the investigated patient needs more 
diagnosis and treatment or not. 




Figure 1. image of a retina with microaneurysms, hemorrhage and 
exudates pointed to. 

Several techniques have been developed for the detection 
of hemorrhages and microaneurysms in fundus images. 
Joshi et al. in [3] used image subtraction to extract the 
blood vessels and hemorrhages. They also used 
mathematical morphology to suppress blood vessels and 
to highlight only hemorrhages. The authors in [4] used 
feature extraction based on pixel classification to detect 
red candidates and distinguish between microaneurysms 
and hemorrhages using k-nearest neighbor classifier. 
Region growing segmentation technique, intensity 
threshold and edge enhancement are used to extract red 
lesions are presented in [5]. Then they differentiate 
between hemorrhage and microaneurysms using neural 
network classification. Shade correction and image 
normalization of the green channel then diameter closing 
is used to detect candidates which are classified into real 
microaneurysms and other objects in [6]. Meindert et al. 
[7] compared the results of five different methods, 
produced by five different teams of researchers on the 
same set of data for microaneurysms detection. In one 
method, normalization of the green component of the 
retinal images using median filtering. Then, vessels are 
detected and removed using a top-hat transform and 
morphological reconstruction. The candidates are 
obtained and each candidate is segmented using region 
growing technique. A new approach based on multi-scale 
correlation filtering (MSCF) and dynamic threshold was 
developed in [8] to detect microaneurysms. Automated 
microaneurysm detection method based on double-ring 
Filter was achieved by the authors in [9]. In [10], a 
method to detect the hemorrhages using hue saturation 
value (HSY) space was proposed. Automated fundus 
photograph analysis algorithms for the detection of 
primary lesions and a computer-assisted diagnostic 
system for grading diabetic retinopathy (DR) and the risk 
of macular edema (ME) are introduced [11]. 

In this work, we propose a fast and an accurate method 
for the detection of microaneurysms and hemorrhages in 
fundus photographs. The detection of such disorders 
could be greatly improved by the new pre-processing 
algorithm which we propose in this paper. The power of 
the proposed pre-processing technique comes from the 
fact that it selectively enhances the contrast of 
microaneurisms and hemorrhage. Because the presence 
of microaneurysms and hemorrhage is the early signs of 



diabetic retinopathy, their detection would be an 
important issue for screening purposes. 

The of the paper is organized as follows: The materials 
and methods used in this work, are explained in details in 
the following section. The results are presented and 
discussed in the last section. Finally, we conclude our 
work and point to our future work. 

2. Materials and Methods 

A dataset of about one hundred retinal images were 
collected for this work. They include most of the retinal 
disorders as well as some images of normal retinas. 
Some of the image in this dataset were collected using a 
VISUCAM Carl Zeiss fundus camera located at the 
biophotonics lab - NILES - Cairo University. The others 
were collected form the free access database STARE. 
Fundus images usually taken as colored or graylevel 
images. Often, The green component of the colored 
image has the highest contrast and more information 
because it’s like red- free illumination images. Normally, 
the blood stream in the graylevel images appear as dark 
objects in brighter background. While in the dye-injected 
patients, they appear as bright objects in darker 
background which results in higher contrast because of 
the fluorescence effect. Trials are always done to 
minimize the use of the dye because of cost and patient 
comfort requirements. So, researchers try to increase the 
contrast of images taken without the injection of a dye. 
Also, it is difficult to the physician to capture a high 
quality fundus images because of many reasons such as 
non uniform illumination, the limited control of the 
patient movement and the focusing required from the 
patient and for the system itself. So, image pre- 
processing is always required to enhance the images. 
Proposed method: In an attempt to overcome the above- 
mentioned drawback related to enhancement of retinal 
images, we tried many of the pre-processing techniques. 
As we are interested in detecting dark objects such as 
microaneurisms and hemorrhage, we proposed a new 
pre-processing technique that can selectively increase the 
contrast of such objects. Segmentation of the anatomical 
structures and lesions can then be achieved using suitable 
algorithms. The steps of the proposed method are 
described below. 

Pre-processing : The images collected for this work were 
investigated to select the images with hemorrhage, 
microaneurysms and exudates. The images were then 
pre-processed to selectively increase the contrast of 
micoaneurysms and hemorrhage because of the interest 
in detection of these lesions. We proposed a new 
preprocessing technique based on frequency domain 
filtering. As it is well known, if a high pass filter is 
applied to an image, it will sharpen this image. That is to 
say that the contrast of objects in relation to their 
backgrounds will be increased. So, high-pass filtering of 
an image increases the intensities of the objects while 
decreasing the intensities of their backgrounds as shown 
in figure 2 c. So, subtracting a high pass-filtered image 
from the original image would smooth the bright objects 
in relation to their backgrounds while decreasing the 
intensities of dark objects in relation to their 
backgrounds. Since we are interested in detecting 
microaneurysms and hemorrhage which appear as dark 
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objects in the graylevel images, we can use this filtering 
and subtraction technique to selectively increase the 
contrast of dark objects such as microaneurisms and 
hemorrhage wile minimizing the contrast of bright 
objects such as exudates and optic disk. The pre- 
processing method can be described using the following 
equation: 

g(x, y) = f(x, y) - af hpf (x, y) 

Where: f(x,y) is the original graylevel image, fhpf(x,y) is 
the high pass-filtered image multiplied by a factor a and 
g(x,y) is the resultant image. Figure 2 shows the original 
colored image and its green component in the upper row 
and the result of high pass filtering and the final result in 
the lower row. From this figure, we can observe that the 
contrast of the bright objects (exudates) is increased. This 
filtered image is then multiplied by 2 and subtracted from 



the original graylevel image. This leads to decreasing the 
contrast of exudates and enhancing the appearance of 
microaneurysms and hemorrhage in the final image. 
Segmentation of retinal images: A good screening 
system should be able to detect any type of abnormality 
associated with diabetic retinopathy with high sensitivity. 
The most important signs are microaneurysms, 
hemorrhages (red lesions) and exudates (bright lesions). 
Also it is important to extract the normal structures such 
as the blood vessels tree, the optic disc and the macula. 
In a previous work, we provided robust algorithms to 
detect the optic disc, the blood vessels, the macula and 
the exudates [2]. To get a good screening system, we 
provide some methods to detect the other two 
abnormalities: microaneurysms and heamorrhages. 




Microaneurysms appear as red spots of circular shape 
with similar color as blood vessels in color fundus 
images. Their diameter is smaller than 125 pm. They 
may appear as disconnected from the blood vessels tree 
because they are situated on capillaries, and capillaries 
are not visible in color fundus images [6], [12]. Also 
heamorrhages have the same color as the blood vessels 
but with irregular shapes. The difference in shape and 
size between microaneurysms and heamorrhages help us 
to distinguish between them. 



After applying the pre-processing step described above, 
microaneurysms and heamorrhages will be of higher 
contrast. As microaneurysms and heamorrhages are dark 
regions in a bright background, we can detect them by 
using morphological filing operation on graylevel images. 
If the original is subtracted from the filled one, then we 
have an image that contain all the dark regions (blood 
vessels tree, macula region, microaneurysms and 
heamorrhages). We can obtain a binary image by simple 
thresholding as demonstrated in figure 3. 
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Detection of blood vessel tree: Now, we need to extract 
the blood vessels from the result image to obtain an 
initial estimation of candidate regions (microaneurysms 
and heamorrhages). This could be done by closing the 
binary image using two line structuring elements of 
different sizes rotated by different angles as in described 
in [2]. The detected blood vessels is shown in figure 4 a. 
Macula detection: The macula is the area of acute vision 
within the retina. It appears as dark and homogeneous 
area near the optic disc, approximately 2.5 times the 
diameter of the optic disc from the centre of optic disc 
[13]. It can be extracted by region growing algorithm. To 
segment the macula, we manually select a point in the 
fovea (the center of the macula), then by a region- 
growing algorithm, the macula was extracted as the 
connected region around the fovea as described in [14]. 
Figure 4 b shows the result of macula detection. 




Figure 4. Initial estimation of microaneurysms and heamorrhages: a. 
blood vessels detection, b. macula detection and c. initial estimates of 
lesions. 



Detection of microaneurysms and haemorrhages: After 
the removal of blood vessels and macula from the image 
obtained by morphological filling and segmentation, the 
result is an initial estimation of microaneurysms and 
heamorrhages as shown in figure 4 c. The final detection 
of these lesions can be obtained by applying the 
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morphological reconstruction algorithm described in [15, 
16] and given by: 

h M =(h<sb) n/„ m 

Where, hk is the marker image at the kth iteration (hi is 
the image contains the inverted initial estimate of 
microaneurysms and heamorrhages superimposed on the 
inverted original image), b is the structuring element and 
Iin is the input image (the inverted green component). 
This is an iterative process which must be repeated until 
no changes occur in h. The final iteration result is then 
subtracted from the inverted green component and 
thresholded to get the final estimate of microaneurysms 
and heamorrhages. Figure 5 shows the different steps of 
this method. 




c. d. 



Figure 5. Image reconstruction: a. mask image Iin , b. marker image, 
c. the result of reconstruction process and d. final estimation of 
microaneurysms and heamorrhages after thresholding. 



Differentiation between microaneurysms and 
haemorrhages: The obtained binary image contains a 
final estimation of microaneurysms and heamorrhages 
together. Now we need to separate the microaneurysms 
from heamorrhages and remove any false features come 
from noise. Microaneurysms are close to circular 
structures with small diameters but heamorrhages are 
irregular shapes. To separate them, all the connected 
components in the binary image are detected. For each 
connected components, the area, the minor axis length 
and the major axis length are to be determined to check 
the circularity of each connected component. A 
component is said to be circular if its c value given by 
equation 3 tends to 1 . 



c = abs( 1 - 



Major Axis Length - MinorAxisLength 



( 3 ) 



m ax (Ma jo rAxis Length , MinorAxisLength 
The result of differentiating microaneurysms from 
hemorrhage is demonstrated in figure 6. 

Results: As mentioned above, the aim of this work is to 
develop a retinal screening system that can help early 
diagnose of retinal disorders to prevent blindness. Major 
steps in such system is to provide some means by which 
the abnormalities can be detected. Since we are interested 
in detecting the retinal abnormalities that can lead to 
blindness, we implemented some algorithms that can be 
used to accurately detect the exudates, microaneurysmss 
and hemorrhage as early signs of retinal disorders. 



a. 




b. 

Figure 6. a: microaneurysms and b: hemorrhages. 

A new pre-processing algorithm based on subtracting 
high pass-filtered of the green component from the 
original one is proposed. The application of this 
algorithm to retinal images, enhances the appearance of 
the dark spots which represent the hemorrhage and 
microaneurysms in fundus photographs. As 
demonstrated in figure 2, the result of this pre-processing 
algorithm increases the accuracy of red lesion detection. 
This algorithm has been tried on our dataset and the best 
results of red lesion enhancement were obtained using 
values of 1 and 2 for the factor a which is the 
multiplication of the high-pass filtered image. 

A combination of the morphological algorithms used for 
the detection of hemorrhage and microaneurysms is sued 
in this work to obtain more accurate results than that 
obtained previously. This resulted in increasing the 
sensitivity of red lesion detection in the fundus 
photographs. Together with our previously published 
exudates detection algorithms, these algorithms are used 
in our proposed screening system. The system provides a 
computer program with a user-friendly interface that help 
physician in diagnosis and treatment planning. The 
system can capture fundus images from earners, process 
them and save them for further analysis and study. It also 
provides general image processing technique as 
enhancement and montage synthesis. A snapshot of the 
screening system is shown in figure 7. In this figure, the 
captured image can appear in the input frame. Many 
images from a video sequence can be taken and put into 
a montage to get the full extent of the human retina for 
better diagnosis and treatment planning. At the lower left 
of the figure, the processed retinal image are displayed. 
The first demonstrates the blood vessel tree as this is of 
great importance for the ophthalmologists. In the second 
image, the blood vessel tree, the optic disc and the 
macula are segmented for further investigation that can 
be done by the physicians. Lesions are detected and 
presented in different colors in the third image. Exudtaes 
appear as yellow structures where hemorrhage and 
microaneurysms are presented at different degrees of red 
color. 
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Figure 7. a snap shot of the screening system. 



3. Conclusion 

In conclusion, we provided a new image pre-processing 
techniques to selectively improve the detection of 
microaneurisms and hemorrhage as early signs of 
diabetic retinopathy. The technique is based on 
sharpening the image using high-pass filtering. Then, 
selective smoothing of the bright objects in relation to 
their background while increasing the contrast of the dark 
objects. This could be efficiently used to improve the 
detection of microaneurisms and hemorrhage in retinal 
images using common segmentation algorithms. The 
proposed method together with our previous work used to 
detect exudates in retinal images can be used to 
implement a screening system for the retinal diseases. 

This is the conclusion. The major contributions of this 
article arise from the formulation of a new 

approach, , to the modelling and 

identification of XYZ and ABC features that provide 
improved computational efficiency in the positioning 
techniques. By manipulating the manner in which feature 
information of both A and BVN signatures is 
incorporated into the model, it can be shown that 
significant improvements in the performance of the 
algorithm can be realised. Moreover, the simplicity and 
the efficiency of dynamic pose tracking techniques 
succeeded to improve the robot pose estimation process. 
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Abstract 

The widespread of new image technologies and data 
exchanges has created the need for new techniques able 
to insure copyright protection and data owner 
identification. Considering the deficiencies of 
steganographie and cryptography, watermarking is 
nowadays a new technique that provides an efficient 
mean for resolving these problems. Watermark 
embedding techniques depend on the representation 
domain of the image (spatial, frequency, and multi- 
resolution). Every domain has its specific advantages and 
limitations. Moreover, each technique in a chosen domain 
is found to be robust to specific sets of attack types. The 
ultimate goal of each technique is to allow the embedded 
watermark to be invisible and robust against a wide range 
of attacks. The most damaging unintentional attacks are 
compression techniques. Many of these last techniques 
are based on the new coding procedures using Multi- 
resolution domain. In order to develop a watermarking 
technique robust against large sets of attacks, we must 
exploit this domain to overcome these attacks constraints. 
In this paper, a study is developed considering all the 
parameters contributing in the watermarking scheme. An 
optimal point allowing robustness and imperceptibility of 
the embedded watermark is established. Using these 
optimized parameters any technique based on the multi- 
resolution domain can reach the maximum of 
imperceptibility and robustness. These results can 
immediately be injected in previous algorithms to 
improve their results or to build new techniques 
outputting directly better results. 

Keywords: robust Image watermarking, wavelet 

transform, Stirmark attacks, parameters optimization. 

1. Introduction 

The increased commercial activity on Internet and media 
industries, demand protection of media such as images, 
video and audio against illicit processing and use. Digital 
representation of copy-righted materials offers various 
advantages; however, the fact that an unlimited number 
of perfect copies can be illicitly produced is a serious 
threat to the right of content owners. [5, 14, 24, 16 and 
10]. Watermarking is used for owner identification and 
authentication by verifying whether the data has been 




used, changed or altered [15 and 17]. A research field 
whose results could undoubtedly help in the design of a 
watermarking algorithm is image compression. The 
image compression is based on removing from images all 
the data which are perceptually irrelevant with respect to 
the human system visual [1]. On the other hand, 
embedding an imperceptible watermark is made in such a 
way that it remains visually invisible and does not 
damage the image contents. As a matter of fact, the new 
generation of image coding such as standard JPEG2000, 
MPEG-4 [4 and 7], rely strongly on DWT to obtain good 
quality images at low coding rates. In order to avoid that 
the embedded watermark disappears when applying a 
compression to the image, the use of the same domain 
(DWT) for watermark embedding where the most used 
compression algorithms are based is strongly 
recommended [6]. This procedure allows following the 
steps of image coding and avoids inserting a watermark 
in the DWT coefficients that are removed after a 
compression is applied [2]. However, a major problem 
must be resolved. How can we choose the different 
parameters taking part in the watermarking procedure? In 
fact if one of these parameters is poorly selected it is 
difficult to reach the uppermost of imperceptibility and 
robustness of the embedded watermark [3 and 18]. In this 
paper, a study trying to optimize all the parameters 
intervening in the watermarking process is conducted. An 
equilibrium point generating the optimal parameters for 
the embedding procedure is established. These 
parameters are injected in the watermarking algorithm in 
order to reach the best equilibrium between the 
imperceptibility and robustness criteria. By the use of 
these parameters the algorithm is found to be robust 
against synchronous and asynchronous transforms. The 
attacks are generated by the STIRMARK tool [13, 25]. 
This paper is organized as the following: In section II an 
overview of watermarking techniques in the DWT 
domain is presented showing the state of art and the 
different parameters related to the watermarking process. 
The experimental configuration and the simulation results 
aiming to optimize all these parameters are presented in 
the third section. The privileges that this optimization 
offer and its efficiency when injected in a watermarking 
algorithm to increase robustness and maintain the 
imperceptibility under the visual threshold is summarized 
in the conclusion. 




Graphics, Vision and Image Processing Journal, Volume 12, Issuel, ISSN 1687-398X, Delaware, USA, April 2012 



2. Watermarking in the DWT domain 

1. Overview 

DWT has gained interest among watermarking 
researchers because of its ability to model the HVS 
behavior. As witness to this domain importance, 
countless papers proposing different algorithms have 
been proposed over the last few years. Many of them take 
inspiration from the most used wavelet based 
compression algorithms. [9], [20] and [22]. Other 
algorithms hide into images binary logos which are also 
hierarchically decomposed [12]. A binary logo is 
decomposed by DWT, and added several times into the 
sub-bands of the DWT decomposition of the host image. 
In [8], the binary logo and the image are hierarchically 
decomposed through DWT; each detail sub-band of the 
logo is then embedded into the corresponding one of the 
image. This is a non blind technique. In [11], a binary 
watermark is coded in selected coefficients of the detail 
bands. In the extraction process the watermark is 
recovered by analyzing coefficients quantization, a 
threshold is fixed to decide about correlation value 
between the original watermark and the extracted one. An 
excellent tutorial on the possible approaches to exploit 
HVS characteristics for watermarking images and videos 
is presented in [21], a variable strength of the coded 
watermark on each DWT band according to HSV is 
developed. In [13] the authors selected the low pass band 
to insert the watermark. A difference is imposed among 
the mean values of two equally sized, randomly selected, 
subsets of the low pass image; a blind technique is 
presented. 

In [19], the watermark is embedded using the LSB 
technique of DWT sub-band coefficients, the selected 
modified bits are chosen with respect to the HSV. As a 
substitutive technique watermark extraction is blind. In 
all these proposed techniques and in other ones the 
magnitude of the different parameters and coefficients 
used depends only on the embedding procedure and the 
image characteristics. The increase of each parameter is 
thresholded by the appearing distortions caused by the 
inserted watermark. But when several parameters 
intervene in the algorithm they may interfere. For this 
reason we must not chose for every parameter alone but 
the optimal values that lead to an equilibrium between 
them which allow us the maximum of robustness against 
attacks and the minimum of perceptibility that must 
remain under the psycho-visual invisibility threshold. 
Three main and essential parameters are used in most 
DWT watermarking algorithm: the embedding strength 
factor, the decomposition level and the sub-band used for 
insertion. Many techniques try to fix one or two of these 
parameters and vary the last in order to maximise the 
robustness of the algorithm without damaging the 
watermarked image. This procedure will decrease the 
reliability of the proposed algorithm even if they are 
satisfying but less that we can get by a parameters 
optimisation [23]. In this paper all these variable 
parameters are tested and optimised through experimental 
results. The results evaluation is based on three essential 
criterions: the psycho- visual sensitivity, the PSNR factor 
and the resistance against synchronous and asynchronous 
attacks. Taking into the consideration the wavelet kinds, 
equilibrium between the cited criteria can be obtained 



basing on this research in order to generate optimal 
watermarking shames. 

2. Watermarks database 

A watermarks database with 50 different watermarks is 
built up. These used watermarks are binary images with 
size equal to PxP, with P=50 as shown by the following 
equation and figures: 

W L ={W L (ij), 0<i,j<p}, W e {0, l} (1) 

E.S.S.T.T CEREP 

Figure 1 : used watermarks 



These watermarks includes personal and general 
information concerning the research unit name.. etc. 

3. Experimental configuration 
Many experiments and tests are developed. The 
watermarking procedure is as follows: Once the original 
image is transformed in the time-frequency domain after 
applying lossless wavelet decomposition, the watermark 
is coded. After recovering the spatial representation of 
the watermarked image many STIRMARK attacks are 
applied in order to test the robustness of the embedding 
algorithm. These tests in addition to other distortions 
measurements allow as in each test checking and finding 
the equilibrium point between the involved parameters. 
The following equation presents the watermark inserting 
procedure: 

fjij) =fmean+(f( i J) ~ f,nean M 1 + OC.W t ) (2) 

f w (i 9 j ) represents the watermarked image, f mean is the 
mean of the used sub-band coefficients, a is the 
embedding strength and w k is the inserted watermark. 

This described procedure is repeated in each test. In every 
set of tests one parameter is varied, the other ones are fixed 
and the optimum is checked. 
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Figure 2: Steps of watermarking scheme 



3. Experimental implementation and tests 

In these experiments five steps are planned. In the first we 
chose the optimal wavelet that generates no loss in the 
decomposed signal and less distortion to the watermarked 
image. In the second step we determine in which sub-band 
of details the watermark is coded with respect to a better 
compatibility with the human system visual. Embedding in 
the sub-band that generates less distortions allow us to 
maximize the amount of coded data and increase of the 
embedding potency. The better embedding equation is 
fixed in the third step. The equation that engenders fewer 
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distortions with the same embedding force is chosen. This 
embedding force called gain factor are optimized in the 
forth step. In the last step, the wavelet decomposition level 
is optimized. Figure 2 describes theses steps. 

1. Optimal wavelet choice 

In time frequency domain, wavelet can generate loss or 
lossless decomposition. Furthermore, adding noise to the 
frequency sub-bands of a decomposed image provoke 
distortions in its spatial representation. But it is important 
to know that the potency of these distortions differs with 
the wavelet kind. In this section, we check witch wavelet 
type can affect the less the spatial image intensities when 
the watermark is added. The diagonal sub-band is used for 
watermark insertion and the gain factor coefficient is fixed 
as 0.8. The table below sum up the comparison study. After 
inserting the watermark, the PSNR between the original 
image and the watermarked one is computed. The wavelet 
decomposition that engenders the less distortion to the 
spatial representation of the watermarked image is 
considered. 





Wavelet Kind 


PSNR (dB) 


1 


Haar Haar 


37.3 


2 


Daubechie2 db 2 


38.65 


3 


Daubechie4 db 4 


38.85 


4 


Biortho gonale 3. 7 Bior 3. 7 


41.83 


5 


Symle4 Sym 4 


41.5 


6 


Symlet8 Sym 8 


41.89 


7 


Coiflet2 Coif 2 


41.5 



Table 1: Different wavelets used and the corresponding generated 
distortions. 



Using the sixth wavelet, the watermarked image presents 
lower amount of introduced distortions. 




2. Optimal frequency sub-band choice 

An image can be transformed by performing a DWT in 
both vertical and horizontal directions, resulting in one 
low frequency sub-band (LL) and three high frequency 
sub-bands (LH, HL, and HH). 
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Figure 4: First level wavelet decomposition 

In fact this decomposition causes some problem with 
respect to the selection of the sub-band in witch is better 



coding the watermark. We don’t know which one from 
the LH, HL or HH sub-bands agree the main with the 
HSV and cause fewer distortions to the original image if 
their coefficients are changed by the watermark coding. 
To answer this inquiry, a study is carried out. It consists 
in fixing the parameters that have not been optimized and 
insert the same watermark with the same embedding gain 
factor in the different frequency sub-band. The insertion 
that engenders lower distortion to the original image is 
considered. Consequently we can increase the gain factor 
and maximize the robustness of the algorithm. The 
watermark insertion in the different sub-bands is 



described by theses equations: 
Yf H . =Xf i . + (x“ -X" 

0,1, J 0 ,i,j rnean \ 0,1, j 0 ,i,j mean . 


)(l + aW iN+j ) 


(3) 


y hl 
1 0 ,i,j 




if + aW MN+iN+] ) 


(4) 


y HH 
1 0 ,i,j 


= x™. +{x" H .-X™. 

u , l ,Jmean \ ", l ,J u , l ,Jmean ■ 


)(l ~l" ^ ’^2 MN+iN+ j ) 


(5) 



As Y 0 . . is the watermarked sub-band, X {) . is the 




original sub-band mean, W is the watermark and a is the 
embedding strength. The result is evaluated by 
comparing the PSNR of the watermarked image in each 
sub-band. The table below presents the gathered results. 



Figure.5: Watermarked cameraman sub-bands 



Gain factor 
coefficient a 


PSNR (HL1 

sub-band 

insertion) 


PSNR (LHI 

sub-band 

insertion) 


PSNR (HHI 

sub-band 

insertion) 


0.1 


50.49 


48.11 


55.18 


0.2 


44.47 


42.09 


49.27 


0.3 


40.95 


38.57 


4581 


0.4 


38.45 


36.07 


43.29 


0.5 


36.51 


34.13 


41.36 


0.6 


34.95 


32.55 


39.82 


0.7 


33.39 


31.21 


38.50 


0.8 


32.43 


30.05 


37.41 


0.9 


31.41 


29.03 


36.43 



Table 2: The PSNR variation the respect to the used sub-band in the 
watermarking process 



By comparing the results, we find that the HH sub-band 
presents more reliability with respect to the HSV and 
causes less distortion to the processed image if it is used 
to code the watermark in. This sub-band is fixed to be 
used in our watermarking process. 

3. Selection of the optimal embedding force 66 a ” 
The embedding force or strength a called gain factor is 
the first parameter in charge of the robustness of any 
watermarking algorithm. The increase of this factor 
causes an increase of the watermark resistance to 
different synchronous and asynchronous attacks. The 
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limit of this factor increase is thresholded by the 
invisibility limit of the coded watermark. If this threshold 
is exceeded, distortions may appear in the watermarked 
image causing visual degradation, revealing the presence 
of an external sequence. The selection of this parameter 
must be well done in order to maximize the robustness of 
the watermark without exceeding the imperceptibility 
threshold. The criterion used to establish the limit of the 
gain factor increase and distortion appear is the PSNR. A 
limit of 37dB is fixed to decide about the presence of 
visible distortions in the watermarked image. The 
following figures illustrate the distortion appearing with 
the gain factor variation. More a increases more the 
watermarked image is distorted. 



Gain factor - 0-5 Gain factor - 5 Gain factor - 10- 




Gain factor - 50 *Gnin factor - 100 Gain factor - 200 




Figure 6: Impact of the gain factor variation 



Gain factor a 


PSNR (dB) 


0.1 


55.18 


0.2 


49.27 


0.3 


4581 


0.4 


43.29 


0.5 


41.36 


0.6 


39.82 


0.7 


38.50 


0.8 


37.41 


0.9 


36.43 



Table 3: PSNR progress with the gain factor variation 




Figure 7: Distortion variation with the gain factor increase 



As shown by table 3, the gain factor corresponding to the 
fixed PSNR threshold is or = 0.8. This is the optimal 
embedding strength that guarantees the maximum 
robustness and minimum imperceptible distortions. 

4. choice of the most useful embedding equation 
Different equations are proposed in the literatures. The 
most used are the following: 



fjij) =f ne , m + (f (i j) ~ f m 
) = f(i,j)X 1 + a.w k ) 



).(1 + a.w .) 



( 6 ) 

(7) 



To improve the watermarking algorithm in to the limits 
of robustness, we have to decide which equation leads to 
a more robust watermark coding. Both equations spread 
the watermark on the image border. The first adjust the 
intensity of a with the mean of the image variation. The 
second embed the watermark linearly in the sub-bands 
c oefficients of the image. 



Gain 
factor a 


PSNR (dB) 


Gain 
factor a 


PSNR (dB) 


0.1 


55.50 


0.6 


41.20 


0.2 


49.7 


0.7 


39.80 


0.3 


46.00 


0.8 


38.70 


0.4 


43.7 


0.9 


37.4 


0.5 


41.80 







Table 4: PSNR progress with the gain factor variation using the second 
equation 




Figure 8: Distortion variation with the gain factor increase using the 
second embedding equation. 



The second equation is more reliable and improves the 
robustness of the algorithm compared with the first one. 

5. Selection of the optimal decomposition level 
As described in the last section, the first decomposition 
generates three high frequency sub-bands (LH, HL, and 
HH). The same process is repeated on the LL sub-band to 
generate the next level of decomposition. For an n-level 
decomposition for an MxN image, the size of the area in 

which watermarks are to be embedded is , as n is 

2 2n 

the level number. 



Decomposition level 


PSNR (dB) 


Level 1 


39.83 


Level 2 


37.31 


Level 3 


35.88 


Level 4 


34.24 


Level 5 


34.38 


Level 6 


30.5 



Table5: Distortion variation against level decomposition 
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Figure 10: Third level decomposition of LENA image. 
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The figure above, show a LENA original image after its 
decomposition until the third level. The following figures 
present the original image and watermark tracked by the 
watermarked images in different decomposition levels 
and respectively the images illustrating the differences 
between the watermarked image and the original one. It 
is clear that the watermark is fixed on the borders of the 
image. From these figures representing the zones of the 
watermark coding we deduce that more the 
decomposition level is high, more the watermark is 
spread and distributed over and near the borders. This 
distribution is controlled in the frequency domain 
depending on the non-randomly selected sub-bands 
coefficients. 



Original image 



Difference result level 1 




Image watermarked in level 2 Difference result level 2 





Image watermarked in level 3 Difference result level 3 





Watermarked image in level 4 



Difference result level 4 





Watermarked image in level 5 Difference result level 5 




Figure 1 1 : Different level watermarked sub-bands and the difference 
between the watermarked and original image 




The following figure represents the PSNR variation with 
respect to the level decomposition. The Distortions 
introduced to the watermarked image are higher when the 
level decomposition of the image is .superior 



PSNR (dB) 




Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 



Figure 12: Distortion variation against level decomposition 

The first level is found more advantageous for watermark 
embedding procedures. It generates fewer distortions 
when compared with levels of higher order. 

4. CONCLUSION 

In this paper, an overview over different watermarking 
techniques in the DWT domain is explored. Processing the 
different parameters that intervene in the watermarking 
process a strategy to optimize these parameters is built up. 
Every parameter is processed distinctly to the others to find 
its optimal value that leads to a robust and imperceptible 
watermarking algorithm. Experiments and tests are 
conducted to locate these values. Injecting these optimal 
parameters in the embedding equation of the watermarking 
process, guarantee better robustness of the watermarked 
image against different attacks and decreases the 
distortions to maintain them under the perceptibility 
threshold. This optimization wasn’t done in the literature 
and can be exploited easily for DWT watermarking 
shames. 
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Abstract 

The aim of this paper is a modeling of precipitations in 
satellite and radar images that are useful for hydrologic 
applications. An investigation on a possible different 
description of precipitations over the sea and on the 
continent is attempted in images concerning the 
Mediterranean region. The model used in this work is the 
autoregressive process AR(p), where p is its order. This 
model makes possible the description of a series of 2208 
satellite images, recorded during a period of 3 months via 
the IR03.9 Meteosat channel, and 4088 radar images 
recorded by the incoherent pulsed radar of setif (Algeria). 
To conduct a detailed study of the phenomenon, a 
scanning area of 120x400 Km in the two areas (sea and 
continent) is adapted to examine all precipitation areas on 
the images, using a windows (5x5 pixels) process. 

The order of the AR model found in this paper, for the 
Mediterranean region and using the two types of images, 
is “1”, which is invariant in time and in space. 

Keywords: Autoregressive, precipitations, Modeling, 
Images, Satellite, Radar. 

1. Introduction 

Over the past decade, great attention has been paid to the 
parameterization of the rainfall fields. In general, the 
most successful hydrological models are those that 
combine spatial and temporal variations of 
precipitations. [9] 

Analyze a chronological series consists in finding an 
adequate mathematical model of the series evolution 
mechanism. The obtained model is used to fulfill the 
objectives such as prediction and control. 

For a series which collect the satellite rainfall estimation, 
the methods published in literature are mainly based on 
the determination of precipitation rates associated with 
different types of clouds, or the characterization of 
atmospheric convection, or tracking the evolution of 
cloud [5] or evaluating the frequency of occurrence of 



cold cloud [7]. However, satellite images do not ensure 
accurate identification of the cloud masses [1]. The 
weather radars are recognized as the most effective tool 
to detect precipitation fields in a given region. The radars 
images are widely used for hydrological purposes, 
rainfall structures and their danger identification for 
short-term forecasting use; also, they serve a rational and 
correct management of water resources. 

Several statistical techniques allow to analyze rainfall 
data collected by networks of pluviometers were 
published in the literature [2, 3, 10]. Other researchers 
have shown that precipitation can be described by 
Markov chains of the first-order with two states taking 
into account the areas occupied in the radar images 
collected in Bordeaux (France) [8]. This representation is 
in the context of a global approach where we assumed 
that the observed state at the present moment depends 
only on the immediate previous state. In this paper, we 
propose a modeling of precipitations over the sea and on 
the continent of the Mediterranean region in satellite and 
radar images using the autoregressive process by taking 
into account additional previous states. 

The remainder of the paper is organized as follows: 
Section (2) focuses on the mathematical formulation of 
an autoregressive process, sections (3) and (4) deal 
respectively with the database used in this work and the 
data processing. We illustrate and validate different 
results in sections (5) and (6). Finally, and in section (7), 
we give our conclusion. 

2. Mathematical formulation 

2.1. Autoregressive process of order p 

Autoregressive models (AR), assume that X t is a linear 
function of previous values. [6] 

X,=f >,x, (1) 

1=1 

Where fa , in this equation are the coefficients of auto- 
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regression. 



The autocorrelation function is a very important function 
in the characterization of a linear process, it’s written as 
follow [4]: 



p(h) = 



r(h) 

r(o) 



z(x,-x |x,.„-x) 



z(x,-x) 



2 



( 2 ) 



Where y{h) is the auto-covariance function and X is the 



average. 



This function provides information about the process 
memory, that is to say, the degree of dependence between 
observations at time t and those made at the time t-h. 



2.2. The Autoregressive process coefficients 

For P = 1, it is easy to see that ) and 

cr 2 = 1 - (j) 2 x , for higher orders (p > 2), the coefficients 

of the AR process are calculated using the following 
recurrence relations : [4] 



The Radar images consist of 4088 images collected in 
Setif in 1999. These images were taken by incoherent 
pulsed radar, (see Figure l.b). All images are recorded 
according to the PPI mode (Plan Position Indicator). 
They are represented in the format 512x512 pixels with a 
spatial resolution of lkmxlkm. The number of color 
levels coding each pixel of these images is 16. 

When we treated the satellite and radar images of the 
Mediterranean region we have seen that the echoes of 
rain on these images were characterized by high 
persistence. In addition, we found that the echoes of rain 
were standing over a period of more than a month. 

Such properties imply that changes in precipitation over 
time can be described by a sequence of random variables 
forming an autoregressive process. 

4. Data processing 

In this study, the images are divided into two zones; one 
corresponds to the sea, the other one corresponds to the 
continent (see Figures l.a and l.b). For a detailed 
analysis of the phenomenon, we proceed to do a 5x5 
pixel windows processing. To study a good part of our 
images, left and right shifts in both zones are applied. 



^hh 



r{h)-^0 h -i, p (h- p) 

p= l 



CJ 



2 

h - 1 



( 3 ) 



^h,p ~ 0h-l,p fihhfih-XM-p ( 4 ) 

*?=i -iA (5) 

i=\ 

If the process is of the order p , then: 



& P+ u = 0p,i 

0 p +i,p+i ~ 0 

The order of an autoregressive process is the value for 
which the partial autocorrelation function is zero. 
Estimated partial autocorrelation coefficients are 
considered invalid if they fall within the confidence 
interval constructed from the standard deviations. 
It is recognized that a series follow an AR (p) if the 
correlogram of its autocorrelation function decreases to 0 
and its partial autocorrelation function vanishes beyond a 
lag P- 

3. Database 

To build our database, we considered three months of 
satellite and Radar observations in the Mediterranean 
region. The satellite images were taken by the channel 
IR03.9 of Meteosat (see Figurel.a), during the period of 
November, December 2009 and January for the 2010 
year. For each day we recorded 24 images, the pixels of 
those images are coded over 8 bits and with a spatial 
resolution of 5kmx5km. 




Figure l.a. Satellite image of the Mediterranean area 
shows the two study zones 




Zone 1 
(The sea) 



Zone 2 

The continent) 



Figure l.b. Radar image of the Mediterranean area 
shows the two study zones 



The results versus time for both areas are represented by 
figures (2. a and 2.b). These curves are respectively 
obtained using the precipitations in the satellite images 
and the factors of reflectivity (ZdBz) in the radar images. 
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Day 

-•-Precipitations over the sea 
-•-Precipitations over the continent 



Figure 2.a. Daily precipitations over both sea (Zone 1) and 
continent(Zone 2) (Satellite images) 




-♦-Factor of reflectivity over the sea 
-►Factor of reflectivity over the continent 



Figure 2.b. Daily factor of reflectivity over both sea (Zone 1) and 
continent(Zone 2) (Radar images) 



From the previous curves, it is clear that the data series 
are stationary as they exhibit fluctuations in rainfall, 
around an average value, 13.27 for the precipitations 
given by the satellite images process and 13.44 for the 
reflectivity’s factor given by the radar database. 



5. Results and interpretations 

The following figures show the variation of the first 15 
autocorrelation coefficients (AC) and partial 
autocorrelation coefficients (PAC) for both areas. 

The confidence band is given inside the interval [-0.2, 

0 . 2 ]. 




Figure 3. a. Variations of autocorrelation coefficients (AC) and partial 
autocorrelation (AC). 

(Process using satellite images for zone 1) 




Figure 3.b. Variations of autocorrelation coefficients (AC) and partial 
autocorrelation (AC). 

(Process using satellite images for zone 2) 
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Figure 3.c. Variations of autocorrelation coefficients (AC) and partial 
autocorrelation (AC). 

(Process using Radar images for zone 1) 




Figure 3.d. Variations of autocorrelation coefficients (AC) and partial 
autocorrelation (AC). 

(Process using Radar images for zone 2) 



We note that the autocorrelation coefficients decrease 
exponentially on the one hand and the partial 



autocorrelation coefficients are within the confidence 
band defined by the interval [-0.2, 0.2]. 

For the values P=l, 2 and 3, the coefficients of the AR 
process are calculated and presented in the following 
table: 





Zone 1 
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Table 1. The coefficients of the AR process 
(Satellite and Radar images). 



The Autoregressive model was applied to all windows of 
the studied areas, where the parameter ^ varies between 
0 and 0.871. The analysis of the partial autocorrelation 
coefficients fa shows that: 

• The process is stationary to the first order. 

• The coefficients (j) ip of the zone continent are 
slightly higher than the coefficients of the zone 
sea. 

• The variance of the error is larger over sea than 
over continent. 

• The first order Autoregressive model (AR(1)) is 
well suited to describe precipitation. 

6. Validation 

After identifying the model, it is important to verify the 
adequacy of the model used with the observations. As the 
errors are not observable, they are replaced by the 
estimated errors calculated from the estimated parameters 
of model: 
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et=X,-t ( 6 ) 

i=\ 

The fit test will be based on the chi-square test and the 
autocorrelation function of the estimated errors. This 
function is written as: 



P h 





t = i 



(7) 



However, because of some dispersion in the calculation 
of autocorrelation coefficients, it is better to consider 
them in their entirety. Thus, we calculate the first 15 
autocorrelation errors. 



Then we use the statistical distribution given by Box and 
Pierce [4]: 

Where n is the total number of samples studied. 

We compare the values of Q n with the threshold £ read 
on the table of chi-square, which is 23.68 for 14(15-P(=1)) 
degrees of freedom and a risk of error of a=0.05. 



Qn 


Zone 1 


Zone 2 


Process by Satellite data 


20.53 


18.32 


Process by radar data 


14.14 


11.88 



Table 2. Values of Q n for both areas 



The results given in Table 2 show that Q n is still below 

the threshold £ for both areas (sea and continent). The 
errors are independent of each other. 

7. Conclusion 

The aim of this paper is a modeling of the satellite and 
radar images precipitations that are useful for several 
applications and the study of the precipitations over the 
sea and on the continent of the Mediterranean region. 

The results of the statistical analysis of images showed 
that the rainfall is well described by a first order 
autoregressive process (AR(1)), irrespective of their 
types, over sea or land. This result is in good agreement 
with those published in the literature. This work showed 
that the historical rainfall data is limited only to the 
previous day. This work can be used in the agricultural 
field and in hydrology for the selection of dam sites. 
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Abstract 

Intravascular ultrasound (IVUS) imaging is a catheter- 
based medical technique establishing itself as a useful 
tool for studying coronary atherosclerotic disease. It 
provides cross-sectional inside view of blood vessels that 
allows a complete quantitative assessment of their 
morphology, such as vascular wall, nature of 
atherosclerotic lesions or plaque shape and size. Manual 
processing of IVUS images is a tedious and time- 
consuming task and it is not efficient for clinical 
purposes. In this work, a new 2D IVUS segmentation 
model that incorporates a fast-marching algorithm and is 
based on gray level gradient of the vessel wall structures 
is developed. The fast-marching segmentation algorithm 
computes the lumen, intima plus plaque structure, and 
media borders simultaneously. Preliminary results of this 
new IVUS segmentation model agree very well with 
contours drawn manually by physicians. Moreover, fast- 
marching segmentation is appeared to be less sensitive to 
initialization. 

Keywords: Intravascular ultrasound (IVUS); 

Segmentation; Gray level gradient; Fast Marching; 
Coronary arteries 

1. Introduction 

Coronary angiography is the gold- standard method for 
imaging and diagnosis of coronary atherosclerotic disease. 
However, it is restricted by its inability to quantify plaque 
burden beyond the luminal silhouette created by the 
angiographic contrast. In the last two decades, the 
intravascular ultrasound (IVUS) imaging has been proven 
better in the assessment of coronary atherosclerosis [1]. 
Intravascular ultrasound (IVUS) technology is a 
relatively new imaging technique that can offer 
information complementary to angiography technology, 
while IVUS provides an inside view of the arterial wall; 
coronary angiography illustrates only the silhouette of 
that vessel in the body [2, 3]. 

IVUS is a catheter-based technique that provides two- 
dimensional (2D) cross-sectional images of coronary 



artery and, therefore, accurate information concerning 
the lumen, wall and plaque area, and vessel morphology. 
Most importantly, the tomographic nature of 
intravascular ultrasound makes 3D reconstruction of the 
vessel wall possible. For this purpose, fusion between 
biplane angiography and IVUS to recover the catheter 
path provides geometrically accurate 3D reconstruction. 
However, a typical intravascular ultrasound bull-back 
produces several hundred of images making non- 
automatic analysis of the data long, fastidious and 
subject to intra- and interobserver variabilities. These 
could be serious limitations against the clinical usage of 
IVUS technique. Therefore, because of poor IVUS image 
quality due to the existence of speckle noise, imaging 
artifacts, calcifications shadowing and rupture of parts of 
the vessel wall, it is necessary to develop automatic 
segmentation methods taking into account the nature of 
IVUS bull-backs. 

Several approaches for semi- and full automatic IVUS 
segmentation have been proposed so far, including 
texture analysis [4], active contours [5], knowledge- 
based graph searching [6], minimum cost algorithms [7], 
and region growing [8]. Among them, the active contour 
model and its variations showed remarkable feasibility 
and accuracy [9-11]. However, some of these algorithms 
require manual contour initialization, thereby increasing 
the user interaction, and concomitantly the uncertainty of 
the segmentation. In the present study, usage of a local 
image characteristic, i.e. the gray level gradients of the 
IVUS image, combined to a fast-marching segmentation 
model, a local-based method handling topological 
contour changes is proposed to address the IVUS 
segmentation problem. 

2. Methods 

Image preprocessing : The preprocessing [16] of the 
image data consists of three steps: (a) the determination 
of the region of interest (ROI), (b) representation of the 
images in polar coordinates, and (c) reduction of noise. 
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Figure 1: Original IVUS image (left) and corresponding polar 
coordinate images before (right top) and after (right bottom) the 
removal of catheter-induced artifacts. 



The first step consisted in determining the image zones 
where the regions of interest (ROI) are extracted yielding 
a squared image. 

The transformation of the IVUS images in polar 
coordinates is important to facilitate efficient pre- 
processing of local image regions in terms of their radial 
and tangential directions. Besides, it facilitates a number 
of other pos-processing steps, such as contour 
initialization and the detected contour smoothing. 
Consequently, each of the original IVUS images is 
transformed to a polar coordinate image where columns 
and rows correspond to angle and distance from the 
center of the catheter, respectively. The polar coordinate 
image, denoted 7(r,0), is used for the remaining analysis 
process. 

The images produced by IVUS bull-backs include not 
only tissue and vessel regions but also the transducer of 
the catheter itself. The latter defines a dead zone of radius 
equal to that of the transducer, where no useful 
information is contained. Because its good results for 
speckle reduction and edge enhancement, an adaptive 
median filter of size 5X5 is applied to remove these 
catheter-included artifacts. 

Fast-Marching Segmentation Model: 

Fast-Marching Methods and Level Sets Methods are 
computational techniques for tracking propagating 
interfaces. They share the virtues of working in an 
arbitrary number of spaces dimensions with no change, 
handle topological merger and splitting with no special 
procedures, and accurately and efficiently compute the 
motion of fronts moving under complex speed laws, 
including large variations in velocities and shape 
discontinuities. 

Fast Marching Methods, introduced by Sethian in [14], 
approximate the solution of a boundary value partial 
differential equations view of propagating interface, 
while level set methods, introduced by Osher and Sethian 
in [12], approximate the solution of initial value partial 
differential equation. At the core, both techniques rely on 
viscosity solutions for Hamilton- Jacobi equations, 
upwind schemes for hyperbolic conservation laws, and 
theory of curves and surface evolution developed by 
Sethian in [12]. 

Consider a boundary, either a curve into two dimensions 
or a surface in three dimensions, separating on region 
from another. The speed function F, which used to 
advance this interface may depend on a wide variety of 
factors, can be written as: 



F=F (L, G, I) (1) 

• Local image properties: they are determined by 
local geometric information, such as curvature 
and gradient gray level. 

• Global image properties: they depend on the 

shape and position of the interface front. 

• Independent properties: they are those 

independent of the shape of the front. 

The difficult challenge in interface problems is to 
guarantee that the produced speed function model F is 
adequate. 

Level Set Methods: Level Set Methods view a moving 
front as the zero level- set of a function (fix, y, t). The 
evolution equation for the interface moving with speed F 
in its normal direction can be written [13] in the 
following general form 



M|Z^) +F |v,(| = o ( 2 ) 

The level- set model can be applied to image 
segmentation by interpreting image contours as the 
propagating interface final position [13, 14]. To achieve 
this, the speed function should become close to zero 
when the propagating front meets with the desired 
contour thus making the interface stop at this position 
and then ending the segmentation process. 

The key idea in level set methods is to approximate the 
gradient in the level set formulation in a way that 
constructs the correct entropy- satisfying condition. One 
of such schemes is given in [14], namely 
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Fast Marching Methods: Fast Marching Method can be 
described as the evolution of an interface propagating in 
the same direction with a speed function F that is neither 
strictly positive nor negative. In this case, the evolving 
boundary must be inside the region to segment for a 
positive speed (or outside the desired region for a 
negative one). The central idea behind the fast-marching 
formulation is to systematically construct the arrival time 
T(x, y), which is the time when the evolving contour 
crosses the point (x, y). The T function satisfies the 
partial differential equation (4), stating that the arrival 
time difference between two neighbour pixels increases 
as the speed of the contour decreases. 



|vr|F = 1 (4) 

As discussed earlier, the fast marching requires carful 
construction of upwind, entropy-satisfying scheme to the 
gradient approximation. Using the above scheme, we 
have 
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The propagation of the front is done via the construction 
of the arrival time function T. The construction algorithm 
[14] selects the front point having the smallest value of T 
and calculates the arrival times of its four neighbours, 
and so on until the front has propagated across the entire 
image. The front is considered stationary when the arrival 
time gradient is sufficiently high. 

Since multiple borders (lumen-intima and media- 
adventitia) must be detected on the IVUS images, the 
fast-marching segmentation algorithm is done via a 
multiple interface extension [15]. Every interface 
associated to the vessel wall contours evolves at a 
velocity defined in terms of gray level gradient image 
feature. The speed function is given by Sethian [13]: 



F{ij) 



1 

i+|vg ct */ s 



( 6 ) 



where G 0 is a 9x9 pixel symmetric Gaussian smoothing 
filter of standard deviation a = 0.5 and the gradient is 
computed in 2D. Therefore, this F function speeds up the 
interface fronts on low gradient regions. 

Segmentation initialization: The fast marching 

segmentation method requires an initialization procedure 
which s necessary to detect the pixels which are likely to 
belong to the lumen-intima and media- adventitia borders. 
Two initialization steps were defined to this end, each 
used for the initialization of the respective border. 

At the first initialization step, the fast marching based 
method is applied to IVUS image to detect intima border 
by defining a circle of radius 1.5 times larger than radius 
of the transducer located at the centre of the (ROI) image. 
The initial contour can be located easily due to the 
catheter clearly different contrast with the surrounding 
region. 

The speed F in equation (6) should take negative value in 
order to speed up the expansion of the curve after some 
iteration to locate the intima boundary. In the second step, 
a large circle is placed outside the IVUS image and with 
a positive value for F the curve can shrink faster to the 
media-adventitia border after some iteration. 

3. Results 

Segmentation results for the gray level gradient based 
fast marching method are shown in Figure 2 and Figure 3. 
The lumen and media detected boundaries are presented 
for typical cross-sectional IVUS images of the whole in- 
vivo and simulated pullbacks. 

The results are promising and they demonstrate the fast 
marching accuracy in determining lumen and plaque 
contours. The media-adventitia border detection depends 
extremely on the quality of the images so a more accurate 
set of results depends directly on the strategy followed 
the contour initialization and detection. 

The fast marching segmentation model is applied first to 
simulated IVUS images and then to real IVUS images. 





(c) (d) 

Figure 2: Simulated IVUS (a) Intima border detection (b) media- 
adventitia border detection (c) Intima-LEE border detection (d). 




(e) 



(f) 



Figure 3: Real IVUS images (a),(b) Intima border detection (c),(d) 
media-adventitia border detection (e), (f). 
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Results show the accuracy of our methods in detecting 
lumen border and media-adventitia contour. As shown in 
Fig. 2, the results of this method for a simulated IVUS 
image of coronary artery, the boundaries of the vessel 
wall were successfully extracted. Fig. 3 shows an 
example outcome of the method, the intima and medial- 
adventitial borders were detected despite the presence of 
noise. 

4. Conclusion 

The goal of this work is to demonstrate the IVUS 
segmentation potential of the fast marching method, and 
the usefulness of local image features such as gray level 
gradients of vessel wall anatomical structures. Fig. 3 
showed that vessel wall boundaries can be identified even 
when image contrast is not very high and when the 
luminal contour shape is irregular. 

Qualitative analysis of the detected contours indicates 
that the level set is an accurate segmentation method for 
intravascular ultrasound imaging, but this should be 
further confirmed from validation with several manually 
traced contours by independent experts. 

These preliminary segmentation results showed that this 
is a promising technique for IVUS image processing. 
Further investigation will be focused on improving the 
algorithms speed and on new methods to automate the 
initialization step to minimize user’s interactions. 
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Abstract 

Fractals provide an innovative method for generating 
three-dimensional (3D) images of real-world objects by 
using computational modelling algorithms based on the 
imperatives of self-similarity, scale invariance, and 
dimensionality. Images such as coastlines, terrains, cloud 
mountains, and most interestingly, random shapes 
composed of curves, sets of curves, etc. present a multi- 
varied spectrum of fractals usage in domains ranging 
from multi-coloured, multi-patterned fractal landscapes 
of natural geographic entities, image compression to even 
modelling of molecular ecosystems. Fractal geometry 
provides a basis for modelling the infinite detail found in 
nature. Fractals contain their scale down, rotate and skew 
replicas embedded in them. Of the many different types 
of fractals that have come into limelight since their origin, 
the Sierpinski fractal has eluded both mathematicians and 
computer scientists alike. And the two-dimensional (2D), 
3D, etc. versions of the same have been realized based on 
the starting axioms/generators as either triangle/pyramid 
or square/cube. The resulting fractals are Sierpinski 
Triangle and Sierpinski Pyramid in case of the triangle- 
based generations; and Sierpinski Carpet and Sierpinski 
Gasket in case of the square/cube based fractals. This 
paper describes a methodology that illustrates the 
closeness in self- similarity otherwise termed as closing 
the self- similarity loop - of fractal images by way of 
generating a 3D Sierpinski Gasket fractal starting with a 
cube as the baseline shape and applying the same 
algorithm recursively by way of Iterated Function System 
(IFS)-like transformations of ((x,y) rotation, z (zoom)) - 
changing a newly introduced property called depth, from 
3 to 2 to 1 in that order - to arrive at a new 3D Sierpinski 
Gasket that resembles a self-similar copy of the original 
3D cube-based Sierpinski Gasket [9], [10], [11]. The depth 
property represents an additional variant which represents 
an iterative- recursive execution count. 

Keywords: Fractals, 3D Images, Sierpinski Traingle, 
Sierpinski Pyramid, Sierpinski Carpet, Sierpinski Gasket, 
3D rendering, Recursion, IFS. 



Nomenclature: 

1.3.7 [Computer Graphics]: Three-Dimensional 

Graphics and Realism - Fractals, Visible Line/Surface 
Algorithms ; 1.3.3 [Computer Graphics]: Picture/Image 
Generation - Display Algorithms, Viewing Algorithms ; 
G.1.2 [Numerical Analysis]: Approximation - Wavelets 
and Fractals, Special Approximation Functions 

1. Introduction 

A fractal is a rough or fragmented geometric shape that 
can be subdivided into parts, each of which is (at least 
approximately) a reduced size copy of the whole or in 
other words, is self- similar when compared with respect 
to the original shape. The term was coined by Benoit 
Mandelbrot in 1975 and was derived from the Latin word 
“fractus” meaning “broken” or “fractional”. The primary 
characteristic properties of fractals are self- similarity, 
scale invariance and general irregularity in shape due to 
which they tend to have a significant detail even after 
magnification-the more the magnification the more the 
detail. In most cases, a fractal can be generated by a 
repeating pattern constructed by a recursive or iterative 
process. Natural fractals possess statistical self- similarity 
whereas regular fractals such as Sierpinski Triangle, 
Sierpinski Gasket, Cantor set or Koch curve contain 
exact self- similarity. The 3D renderings of Sierpinski 
Fractal with Triangle as the initial axiom/generator are 
the Sierpinski Traingle and set of n-dimensional 
Sierpinski Pyramid(s) where n>=3. Analogously, the 3D 
renderings of the Sierpinski Fractal with Square as the 
starting axiom/generator are the Sierpinski Carpet and the 
set of n-dimensional Sierpinski Gaskets, where n>=3. 
Paul Bourke has studied the Sierpinski Gasket and its 
versions in [17]. In this paper the authors describe a 
methodology that attempts to illustrate the concept of 
closing the self- similarity loop of fractal images by way 
of generating a 3D Sierpinski Gasket fractal starting with 
a cube as the baseline shape and applying the same 
algorithm recursively by way of IFS -like transformations 
of ((x,y) rotation, z (zoom) ) - changing the recursion 
depth from 3 to 2 to 1 in that order.The source image for 
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the 3D Gasket is generated based on the 2D version of 
the same that is obtained by applying IFS 
transformations to a square figure after n iterations where 
n>=3. The original C program - courtesy of Audrey 
Mirtchovski (aam396@mail.usask.ca) - is used in the 
above implementation [15]. The displayed output of the 
same based on a typical set of inputs is resented. The 
remainder of the paper is organized as follows: Section (2) 
focuses on brief description of Sierpinski Triangle, 
Sierpinski Pyramid, Sierpinski Carpet and Sierpinski 
Gasket. Section (3) emphasizes on method of recursively 
generating the 3D Sierpinski Gasket to demonstrate 
closing the self- similarity loop. Section (4) shows the 
experimental results and Section (5) provides the 
concluding remarks. 

2. About Sierpinski Triangle, Sierpinski 
Pyramid, Sierpinski Carpet and 
Sierpinski Gasket 

A brief description of Sierpinski Triangle, Sierpinski 
Pyramid, Sierpinski Carpet and Sierpinski Gasket along 
with their properties followed by a description of 
generating the 2D and 3D versions of the same by using 
iterative function systems is presented in the paragraphs 
that follow. A detailed account of the Sierpinski Triangle 
and Sierpinski Gasket can be found in the book(s) as in 
[14] and in [9]-[10]. 

Sierpinski Gasket/Pyramid 

The Sierpinski gasket, also known as Sierpinski triangle 
is a fractal produced by starting with one big filled-in 
triangle and dividing the triangle into four smaller 
triangles, then cutting out smaller and smaller ones from 
it in appropriate places in a recursion process which 
involves blanking out the center triangle and carrying out 
the above recursion process to operate upon the 
peripheral triangles. This involves a starting figure, a 
generator (the transformation) and a recursive process of 
applying the generator to the starting figure. The 
Sierpinski Gasket can also be generated by starting with a 
rectangle as the starting figure and applying a set of 
affine transformations (a rotation, a scaling and a 
translation) in each iteration step. The resulting fractal is 
the Sierpinski Gasket, a self- similar, equilaterally shaped 
triangular image that is symmetric with respect to the 
starting triangle. 

The Sierpinski Pyramid is a three dimensional version 
of the two dimensional Sierpinski Gasket. Like the 
Sierpinski gasket, it is a recursive structure. A recursive 
structure of power 3 generates a Sierpinski Pyramid in 
3D. Essentially this means that the 3D Sierpinski 
Pyramid consists of a pyramid constructed out of 4, 
power 2 Sierpinski Gaskets. 

Sierpinski Carpet 

The Sierpinski carpet is a fractal produced by removing 
the central square piece from a square sliced into thirds, 
horizontally and vertically. The area tends to zero and the 
total perimeter of the holes tends to infinity. Recursively 
cutting out smaller and smaller ones from it in the 
appropriate places yields the Sierpinski Carpet. 




3. Method of Recursively Generating the 3D 
Sierpinski Gasket to Demonstrate Closing 
the Self-similarity Loop 

A step-by-step description of the method used to generate 
and re-generate 3D fractals based on the cubed-based 
version of the 3D Sierpinski Gasket is outlined in the 
following steps: 

As Figures 1, 2, 3 suggest, the originally generated 2D 
Sierpinski Gasket (Figure 1.3) uses a square as initiator 
(Figure 1.1) and the IFS as generator (Figure 1.2). The 
equivalent 3D version of the Sierpinski Gasket (Figure 
2.2.1) uses a cube as initiator (Figure 2.1.1) and the IFS 
as the generator (Figure 2.1.2) all of them having an 
additional property called depth = 3; Figures 2.2.2, 2.3 
and Figure 3 are the maneuvered versions of Figure 2.2.1 
obtained by recursive re-generation with the same depth 
property (i.e., depth=3). Figure 4 is the maneuvered 
version of Figure 3 obtained by recursive re-generation 
with the depth property equal to 2. Finally Figure 5 
seemingly displays the starting 3D Sierpinski Gasket as a 
3D Cubed version of Figure 2 with the depth property 
changed to 1. The step-by-step procedure is as follows: 

1 . Adding a new depth property that is input- 
dependent to a chosen 3D Cube image. The 
default depth is chosen to be 3. 

2. The generating IFS consists of a (x,y,z) 
transformation that consists of an (x, y) rotation 
and a z-based zoom or the scaling factor. 

3. Setting the reference frame by specifying a 
(height, width) pair to the enclosing (movable) 
window. The default colour scheme is red/black 
for each cube-based 3D fractal. 

4. Using the IFS, auto-generate 8 self-similar 
fractal blocks and position them at the 8 
different corners of its parent cube fractal. This 
creates a linked chain of smaller but self- similar 
3D Sierpinski Gaskets - that are within the 
boundaries of the frame-of-reference. 

The dynamic variables involved in this method are: 

1 . The depth property that represents an 
additional new variant that defines a 
projection 

2. The reference frame itself 

3. The (x,y,z) triple representing the translation for 
the 3D fractal. Here z represents the zoom i.e., 
scaling factor 

The dynamic translation is achieved by auto-capturing 
the variations in the scaling (the z- variable) value and the 
corresponding (x,y) rotation; and auto-adapting the GL 
projection(s) to the conforming viewports. 

The program runs as a Windows console application 
using the Open Glut API, with the 3D Sierpinski Gasket 
being generated as an IFS-based recursive version of the 
2D Sierpinski Carpet that uses a cube as the base initiator 
for the IFS. Successive fractal images are obtained by 
interactive mouse-based positional translation and 
context-aware zoom-based scaling in a dynamic fashion 
and varying the depth property from 3 to 1. Using 
MATHLAB 3D Image Rendering software, the displayed 
graphics are processed for ‘true’ GUI compatibility. 

The original 3D Sierpinski Gasket (Cube based version) 
is shown in Figure 2.2.1. This has a depth of 3. By 
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mouse-over shearing and skewing of the same, the 
algorithm re-generates Figures 2.2.2, 2.3, 3 all of which 
also simulate a depth of 3. Figure 4 shows the 3D 
Sierpinski Gasket obtained by applying the above angular 
shearing to emulate a depth of 2. The final Figure 5 
shows the 3D Sierpinski Gasket obtained by applying the 
above angular shearing to emulate a depth of 1. This 
proves that the 3D Sierpinski Gasket of depth 3 and depth 
1 are almost identical in geometric similarity - a result 
that demonstrates the closeness in self- similarity of 
fractals. In this case, the self- similarity was symmetric in 
nature. 

The IFS used for the experimental results is presented in 
the details that follow. 

1. First we get the 2D Sierpinski Gasket followed 
by GL DEPTH BUFFERING and a set of 
recursively iterated IFS to get to the 3D Cube. 

2. The IFS Code of the 3D Sierpinski Gasket (as 
discussed in this paper) consists of an Initiator 
(or axiom) and a generator (i.e., a set of rules) 
that are applied as an IFS and recursively 
executed n times where 1 <= n <= 9. The degree 
of iteration or recursion is termed as depth of 
this variable. 

The experimental results demonstrated here use 
depths 3, 2, 1 respectively. Also the output of depth n 
based 3d Sierpinski Gasket is used as the input for the 
depth n-1 based 3D Sierpinski Gasket. 

//Initiator: Cube with the following definition: 
Recursive depth = 3; 

Cube Side =10; 

(x,y,z) = (0,0,0); // Translation 

// Draw a cube of size=n. Each triple corresponds to a 

// distinct edge of a cube. 

float boundary [8] [3] = { 

{- 1 , - 1 ,- 1 }, 

{- 1 , 1 ,- 1 }, 

{ 1 , - 1 ,- 1 }, 

{ 1 , 1 ,- 1 }, 

{- 1 ,- 1 , 1 }, 

{- 1 , 1 , 1 }, 

{ 1 ,- 1 , 1 }, 

{ 1 , 1 , 1 }, 

}; 

Draw_CUBE() 

{ 

// the pseudo code goes here. 

} 



//Generator: the function with following set of rules: 

Draw_RecursiveDepthCUBE(int depth) is the generator. 
Here depth is recursive depth (i.e., the number of 
iterations deep the generator can be applied to itself as a 
recursive call to generate different versions of the 3D 
Sierpinski Gasket without losing its self- similarity. 



//The zoom factor or the z-buffer (depth buffer 
//corresponding to the GL_DEPTH BUFFERING IS 



// TREATED SEPARATELY FROM THIS 'DEPTH' 
//VARIABLE 



{ 

Axiom = Draw_CUBE(); 

F = Draw_RecursiveDepthCUBE(l<=depth<=9); 

F 3 = Draw_RecursiveDepthF 3 CUBE(3); 

F 2 = Draw_RecursiveDepthF 3 CUBE(2); 

Fi= Draw_RecursiveDepthF 2 CUBE(l); 

} RECURSIVELY_RE-GENERATE_3DSIERPINSKI; 

Additional details and other related techniques for further 
research can be found in the book(s) as in [14] and in [7]- 
[ 10 ]. 

4. Experimental Results 

This section depicts the results obtained by applying 
above described method(s) to demonstrate one scenario 
of closing the fractal self- similarity loop. One conclusion 
that can be made in a definitive manner from this 
experiment is: We started with a cube to generate the 3D 
Sierpinski Gasket (by imitating the 3D Sierpinski 
Triangle) and then recursively re-generated the 3D fractal 
to arrive a new 3D Sierpinski Gasket version - that 
resembles an almost 360-degree self-similar 3D version 
of the original fractal - but one that is constructed as a 
self-manipulated syndicate of successive iterated 
version-results, each of which represents a 3D fractal- 
pore of the initial 3D Sierpinski Gasket. The program 
was written in the C++ programming language, the 3D 
“visual” rendering was done using the Open Glut 
Graphics library. It runs as a Windows console 
application with the variable depth being input as the first 
command-line argument (any no-zero integer value, 
preferably from 1 to 9). 

Figures 1.1 to 1.3 show iterated version displays of 
generating the 2D Sierpinski Gasket after 0, 3, n 
iterations. Figure 2.2.1 shows the 3D Sierpinski Gasket 
with depth 3 (the default) generated using a cube as the 
initializing axiom (Figure 2.1.1) and the IFS-based 
generator as shown in Figure 2.1.2. Figures 2.2.2 and 
Figure 3 show the recursively re-generated versions of 
the same IFS. Figure 3 is used as primary input source for 
the demonstration process. Figures 4 and 5 depict the 3D 
versions of the Sierpinski Gasket with depths 2 and 1 
respectively. 



Figure 1.1 Initial 2D Sierpinski Gasket - Square. 
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Figure 1.2 2D Sierpinski Gasket after 3 IFS -iterations 
applied to Figure 1.1. 




Figure 1.3 2D Sierpinski Gasket, after applying IFS 
transformations randomly (to Zoom-in) on Figure 1.2. 




Figure 2.1.1 Initiator for the Sierpinski Gasket (Cube- 
based) having depth=3. 




Figure 2.1.2 Generator for the Sierpinski Gasket (Cube- 
based) having depth=3. The generator is based on the IFS 
Code as outlined in the end of section 3. 




Figure 2.2.1 Original 3D Sierpinski Gasket (Cube-based) 
having depth=3 obtained from the initiator and generator 
as shown in Figures 2.1.1 and 2.1.2. 




Figure 2.2.2 3D Sierpinski Gasket (Cube-based) having 
depth=3 obtained by recursive re-generation of Figure 
2.2.1 (using the generator as shown in Figure 2.1.2). 
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Figure 23 3D Sierpinski Gasket (Cube-based) having 
depth=3 obtained from recursive re-generation of Figure 
2.2.2 (using the generator as shown in Figure 2.1.2). 




Figure 3 3D Sierpinski Gasket (Cube-based) having 
depth=3 and applying the generator on Figure 2.3. 




Figure 4 3D Sierpinski Gasket (Cube-based) generated 
using depth=2 applying the generator on Figure 3. 




Figure 5 The 3D Sierpinski Gasket (Cube-based) 
generated using depth= 1 applying the generator on 
Figure 4. 



5. Conclusion 

The 3D version of the Sierpinski Carpet termed as 
Sierpinski Gasket gives an idea about the nature and 
mathematical aspect of the Sierpinski fractal. This is 
analogous in perspective to the version of the Sierpinski 
Triangle known as Sierpinski Pyramid. This paper 
presents a method of algorithmically maneuvering 3D 
fractal images that illustrates the concept of closing the 
self-similarity loop of fractals by way of generating a 3D 
Sierpinski Gasket fractal starting with a cube as the 
baseline shape and applying the same algorithm 
recursively by way of IFS-like transformations of ((x,y) 
rotation, z (zoom) ) - and changing aadditional variant 
property called depth (that represents a new property 
corresponding to projection) from 3 to 2 to 1 in that 
order.The source image for the 3D Gasket is generated 
based on the 2D version of the same that is obtained by 
applying IFS transformations to a square figure after n 
iterations where n>=3. The displayed output of the same 
based on a typical set of inputs is presented. The 
experimental results are shown in section 4 and 
concluding remarks in section 5. The results demonstrate 
the closeness in self- similarity aspect of fractal images 
and how by using theory, computation and 

experimentation that fractal geometry is both an art and a 
science. 
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Abstract 

This paper presents a methodology for integrating a new 
parameter measuring spatial relationships in the hidden 
Markov models (HMM) in order to detect, interpret and 
predict changes in urban areas from satellite images. This 
approach is divided into three phases: the detection of 
different spatial relationships in the urban area; the 
training of a hidden Markov model using Baum- Welch 
learning algorithm, integrating the changing spatial 
relationships obtained through the Allen's temporal 
algebra; the interpretation of changes in urban area and 
the prediction of future changes. 

Simulated spatiotemporal changes on synthetic data show 
the interest of this method for the analysis of 
spatiotemporal changes of relations between objects. 
Results allows detection and prediction to be performed 
from the various time series of images for the 
observations of spatiotemporal events such as urban 
expansion. It is therefore reasonable to use this approach 
to interpret and estimate the movement of the urban area. 

Keywords: Change detection, Spatial relationships, 
Allen’s temporal algebra, Intra-urban analysis, Hidden 
Markov model. 

1. Problematic and state of the art 

Today, time series of satellites images are easily 
available and allow the evolution of earth surface to be 
observed. These series constitute a great volume of data 
and they contain complex information. For example, 
many spatiotemporal events, such as the harvest, the 
maturation of culture or the evolution of urban areas, can 
be observed and are useful for e.g. the management of 
land use or the understanding of town expansion. 



We propose in this work to monitor intra-urban changes 
using time series of satellites images. 

Severals authors [3] [7] [21] have proposed a 

comprehensive review of existing approaches to change 
detection in remote sensing. Derrode [23] and Carincotte 
[6] have developed a method to detect changes between 
two Synthetic Aperture Radar (SAR) images using fuzzy 
hidden Markov chains. Bouyahia et al. [28] have also 
used hidden Markov models to unsupervised change 
detection in bi-date SAR images. Aach et al. [24] 
developed bayesien algorithms to change detection using 
Markov random fields. Antonius et al. [4] developed a 
technique for Markov change detection that can be used 
to predict future changes based on the rate of change in 
the past and generating probabilities of changes between 
classes. 

Oscar et al. [25] [26], Van et al. [9] and Weizman et al. 
[15], also used information such as the texture or, gray- 
levels for the segmentation and the recognition of urban 
areas. Fai et al. [27] have used perimeter, texture mean, 
and area sush as parameters for segmentation of urban 
area. Finally, Nielsen [18] used an approach (Window 
Independent Context Segmentation (WICS)) for the 
extraction of different urban area categories from satellite 
images. 

In the spatio-temporal segmentation domain, studies have 
mainly focused on the modeling of temporal texture. For 
example, in [2], authors generalize the method of 
Haralick et al. [22] using the co-occurrence matrix in 
three dimensions to extract information related to the 
evolution of a texture. 
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Spatial relationships constitute structured knowledge, as 
opposed to digital information such as gray-level, color, 
or texture that are extracted from images. 

Our goal, therefore, is to present a methodology to 
monitor, interpret, and predict the intra-urban changes, 
using Hidden Markov Model (HMM) taking into account 
the spatio-temporal relationships. 

Hidden Markov Models (HMM): Hidden Markov Model 
(HMM) [8] represents a double embedded stochastic 
process. In an HMM, the observations (Vi) are regarded 
as symbols emitted by non observable states (Si), 
following particular probabilistic functions, whereby the 
state sequence is a first order Markov Chain. 

Let N be the number of states in the model (the 
individual states are denoted as S= {Si,..., S N }, and q t be 
the state of S t at time t, let M be the number of distinct 
observation symbols per state (the individual symbols are 
denoted as V= {vi,...,v M }). A basic HMM consists of 
three sets of parameters: 

• The symbol emission probabilities bj k - the 

probability that symbol v k is emitted by state Sj 

J = P [v k at 1 1 Sf = £ r ] 1< j< N, 1< k< M 

• The state transition probabilities ay- the 

probability of being in state Sj in the subsequent 
time instant given that the current state is Si 

A = fey]=P [ J /=fr.iil 5 E=*rL J ^ N 

• The prior probability distribution :?r-j that the 
system is in a given state Si at the initial time 
instant 

£ = fC|3=P 1< i < N 

An HMM is typically written as: X = { A, B, si}. 

Spatial relations: The spatial relationships constitute the 
basis of linguistic description of spatial configurations. 
These spatial relationships are generally classified into 
different categories [14]: topological, distance and 
directional relationships. They represent an effective 
way to describe structured scenes [12]. 

The distance relationships are based on the notion of 
distance in a Euclidean space. 

The directional relationships represent the positions of 
objects one with respect to the other, or relative to the 
frame of the image, the same way that geographic 
directions are expressed using four cardinal points [1]. 
Indeed, there are mainly three types of directional 
relationships: 

Strict: North, South, East and West. 

Mixed: North, South, East and West, Northeast, 

Northwest, Southeast and Southwest. 

Position : to the right, to the left, below, above. 

Finally, the topological relationships don’t give 
information on the cardinal positioning of objects, but 
rather interaction between two objects. The model the 
most widely used is the model of “nine intersections”of 
Egenhofer [16] [17]. In [19], eight relations as formally 
defined. Figure 1 that capture the topological relations 
between a pair of objects. 



Relation 


Mathematical formulation 


Disjoint (A, B) 


A° n B° = 0 A A° n dB = 0 A dA n B° 
= 0 a dA n dB = 0 


Meet (A, B) 


A° fi B° = 0 a (A° n dB ^ 0 v dA n 
B° ^ 0 v dA fi dB ^ 0 ) 


Inside (A, B) 


A° n B° gt 0 a A° n b- = 0 a A- n B° ± 
0 a dA fi dB = 0 


Contains (A, B) 


A° n B° =£ 0 a A° n B- a 0 a A- n 
B°= 0 a dA fi dB = 0 


CoveredBy(A,B) 


A° n B° 0 a A° n b- = 0 a A- n B° * 
0 a dA fi dB =t= 0 


Covers (A, B) 


A° n B° =t= 0 a A° n b- =£ 0 a A- n B° = 
0 a dA fi dB =t= 0 


Equal (A, B) 


A° n dB = 0 A A° n B- = 0 A dA n B° 
= 0 A dA D B“ = 0 
A A- fl B°= 0 A A- n dB= 0 


Overlap (A, B) 


A° n B° ^ 0 a A° n B- ^ 0 a A- n B° 
=£ 0 



Figure 1 . Formal definitions of eight relations for topological relation 
between two objects. A is an object, dA is the edge of A, A° its interior 
and A“its exterior 



Temporal relations: Temporal relations are explicit or 
inferred relations that are used to describe events or 
ordered states with respect to time. The main models of 
temporal relations are Allen’s interval algebra [10] [11] 
and Vilain’s point algebra [20]. In Allen's theory 13 
relations describe all possible combinations between two 
intervals (Figure 2). 

These basic relations can be combined by disjunction, to 
obtain 2 1 3 relations. In Vilan’s point Algebra, three 
basics relations, above (<), identic (=), and follows (>), 
are considered. 

We are interested here to retrieve different areas 
(buildings, roads,...) present in urban area, the description 
of these areas is very interesting and especially the 
positions and relationships between them. Spatial 
relationships become an asset to detect these 
relationships and to describe the urban structural. The use 
of Allen’s interval Algebra allows all the possible 
relations between two intervals of time to be described 
and can be integrated in the analysis of time series of 
images.Now we propose our method, which aims to 
extract the various urban areas and to detect the different 
spatial relationships between these areas, and interpret 
changes through time using the hidden Markov model 
(HMM). 
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Relation 


Mathematical formulation 


i i before k 


End(i i) < Begin(k) 


i i meets k 


End(i i) = Begin(k) 


i i overlaps k 


Begin(ii) < Begin(k) < End(ii) A End(ii) 
< End(k) 


i i starts k 


Begin(ii) = Begin(k) A End(ii) < End(k) 


i i finished k 


Begin(ii) > Begin(k) A End(ii) = End(k) 


i i during k 


Begin(ii) > Begin(k) A End(ii) < End(k) 


i i equal k 


Begin(ii) = Begin(k) A End(ii) = End(k) 


i i containt k 


Begin(ii) < Begin(k) A End(ii) > End(k) 


i i finished k 


Begin(ii) < Begin(k) A End(ii) = End(k) 


i i started k 


Begin(i i)=Begin(k) A End(i i) > End(k) 


i i overlapped k Begin(k) < Begin(ii) < End(i i) A End(i i) > 

End(k) 


i i met k 


Begin(i i) = End (k) 


i i after k 


Begin(i i) > End (k) 



Figure 2. Allen’s temporal relationships [5]. Begin(i) and End(i), 
represent the start time and end time of intervals 



2. Proposed approach 

The method we propose is composed of three phases. The 
first phase is a processing step that aims to build feature 
vectors, that carry out the evolution of spatial 
relationships through time. 

The second phase is the learning step which aims to train 
the proposed model using the Baum- Welch algorithm in 
order to detect changes. Finally, the third phase is the 
recognition step that allows changes to be interpreted 
and predictions to be made. 

Let X = { A, B, il] be the hidden Markov model, with A 
is the transition probabilities; B is the emission 
probabilities and it the initial probabilities. 

Figure 3 shows a generic model for such a HMM. 



&N1 




Figure 3. A generic model of hidden Markov model. S is the state set 
(Evolution of Expansion is denoted as EvEx)\ V is the observations set 
(Evolution of Spatial relationships is denoted as EvSr), a tj is the state 
transition probability, and b jk is the symbol emission probability 



different regions of interest, and then detect the different 
spatial relations between these regions. 

Finally, we use Allen’s interval algebra to model 
temporal intervals, and we monitor the evolution of 
spatial relationships through time. We finally store the 
information obtained as they are the observations in our 
hidden Markov model (Figure 4). 




Figure 4. Processing module 



Once spatial relationships between regions are detected 
and monitored, the results of their temporal evolutions 
are obtained. We then use learning algorithms to train the 
model using a series of multi-date satellites images. 

Learning module: This module aims at estimating the 
parameter vector A of the HMM on the basis of a set of 
sequences of observations called training corpus. Recall 
that the sequences of observations are the evolutions of 
spatial relationships over time. The learning algorithm 
chosen is the Baum- Welch algorithm [13], since it allows 
a significant improvement compared to traditional 
learning algorithms. Its goal is to maximize the 
likelihood of a model X, it substantially changes the 
model parameters studied to increase the likelihood. The 
algorithm performs its optimization by re-estimating the 
parameters (A, B and sc), of the observed sequence. 

Figure 5 shows the general process of learning system. 



Processing module: Initial data are images acquired at 
different dates. We segment these images to extract the 
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Figure 5. Learning module. xQ represent the temporal evolutions 
of spatial relationships and y(j) represent the expansions of regions. 

3. Results and discussion 

To validate our methodology, we used a series of images 
which are constituted of two regions. In our case we will 
identify the different relationships between two binary 
classes. In a preliminary study, we use synthetic images 
containing two objects of different colors. 



Figure 6 shows some synthetic images containing objects 
of different colors that change through time. 




Figure 6. Synthetic images 



Detection of spatial relationships and monitoring their 
evolutions over time: Table 1 presents the results of 
experimental trials for some images. For each image at 
two distinct instants, we present the spatial relationship 
detected, the temporal evolutions of spatial relationships, 
and the evolutions of intervals based on the Allen's 
algebra of intervals. 

In order to compute the evolution of spatial relations, we 
use the index EvSr as an index for computing changes in 
a spatial relationship between two instants f and t i+ i. 
EvSr= Temporal relation [spatial relationship (f), spatial 
relationship (f+i)] 

For example: 

If [spatial relationships (f) = ‘DISJOINT’ and spatial 



relationships (ti+i) =’ OVERLAP’] 
Then EvSr= ‘Strong Evolution’ 



Instants (i) 


Relation (A, 
B) 


EvSr 


Modeling 
of intervals 


[1998, 2000] 


A disjoints B 


Low evolution 


i i meets i 2 


[2000, 2002] 


A Meets B 


[2001, 2004] 


A Meets B 


No evolution 


i 2 overlap i 3 


[2005, 2006] 


A Overlaps B 


Strong evolution 


i 3 disjoint i 4 


[2005, 2010] 


A Covers B 


Middle evolution 


U starts i 5 



Table 1. Results of spatio-temporal relations 



In this case, we can model the expansion of regions 
according to the evolution of spatial relationships. 



Modeling with HMM: 

To use the Hidden Markov Model, observable and hidden 
states have to be defined. Therefore, we will consider 
here the temporal evolutions of spatial relationships as 
the observable states and the expansions of regions as the 
hidden states. 

We propose the following HMM: 

Observable states: {‘A: no evolution’, ’ B : low 
evolution’, ‘C: middle evolution’, ’£>: strong 
evolution’ } 

Hidden states: {‘7: no expansion’, ’2: low 
expansion’, ‘3: middle expansion’, ’4: strong 
expansion’ } 

We introduce the indices obtained which are the 
observable states in HMM, and the indices will be ranked 
according to rule: 

7/* EvSr is null then no expansion 
7/* EvSr is low then low expansion 
7/* EvSr is middle then middle expansion 
7/* EvSr is strong then strong expansion 

• Learning phase 

The results of the evolution of spatial relationships will 
be considered as the observations of the HMM. From 
these observations, we must determine the hidden states 
associated with them depending on the evolution of 
spatial relationships. The observable states and the 
hidden states will be used as the learning parameters of 
the model. This learning is achieved through the use of 
the Baum- Welch algorithm. 

To train our model, we used the alphabet {'A', 'B', 'C', 'D}, 
which represent the evolution of spatial relationships that 
are observable states in the hidden Markov model, so we 
used a series of 102 synthetic images, where each image 
contains two objects of different colors, and these objects 
change through time. We detected the different spatial 
relationships, and we got their evolutions through time, 
and finally we built the observations from the basis of 
changes in spatial relationships. We obtained the 
following observations sequence for this series of 
synthetic images and we introduced them in the learning 
phase for our system: 

“BAAABAABABABABBDBABBAAAAAABDDBBA 

BBABCBAAAABBAABCCBAAAAAABBBBABBAA 

AAAABABDCBADAABBBADCABBCBBBDBACAA 

B” 



After the integration of this sequence in our model we 
obtained the following results: 

The initial probabilities: 



1 


2 


3 


4 


0,152 


0,721 


0,027 


0,000 



The transition matrix: 





1 


2 


3 


4 


1 


0,2861 


0,7139 


0,0000 


0,0000 


2 


0,0000 


0,1150 


0,2750 


0,6100 


3 


0,5509 


0,3378 


0,0000 


0,1123 


4 


0,0000 


0,0046 


0,3712 


0,6242 
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The emission matrix: 





1 


2 


3 


4 


A 


0,3129 


0,0000 


0,0000 


0,9694 


B 


0,0000 


1,0000 


0,8672 


0,0016 


C 


0,4337 


0,0000 


0,0000 


0,0290 


D 


0,2534 


0,0000 


0,1328 


0,0000 



• Interpretation phase 

Our methodology is now able to interpret the spatio- 
temporal events of regions from the HMM. Recognition 
consists in finding the most likely sequence of hidden 
states leading to the generation of a given output 



• Prediction Phase 

To predict the most probable event at a future time, we 
can use the evaluation function included in the 
interpretation system. For example, if we want to predict 
the most probable event at t = 9, considering the 
sequence of observations "BBCDADDB" we first have to 
compute the probability of observing sequences 
"BBCDADDBA", "BBCDADDBB", "BBCDADDBC", 
"BBCDADDBD", then to distinguish the series of the 
most probable observation and therefore we can predict 
the event time t = 9 using the recognition step of our 
interpretation system. 



sequence. This sequence of hidden states is generated 
using the Viterbi algorithm [13]. 



Table 2. Example of results of events prediction 
Table 2 show an example of events prediction, we notice 
that this step allows us to predict the states of area 
expansions. The class B is the most probable and we can 
predict the state of this region. 

4. Conclusion 

The proposed methodology allows the spatiotemporal 
changes in an urban area to be detected from a time series 
of satellites images. That allows the modeling of changes 
in urban area and the further interpretation in order to 



A: 



B: 



0,2861 


0,6317 


0,0000 


0,0000 


0,0000 


0,1150 


0,2750 


0,6100 


0,5509 


0,3378 


0,0000 


0,1123 


0,0000 


0,0046 


0,3712 


0,6242 



0,3129 


0,0000 


0,0000 


0,9694 


0,0000 


1,0000 


0,8672 


0,0016 


0,4337 


0,0000 


0,0000 


0,0290 


0,2534 


0,0000 


0,1328 


0,0000 



Pi: 



0,152 | 0,721 | 0,027 ^),000 



0= {BBACDAA} 



HMM 

Viterbi algorithm 



2 , 2 , 1 , 3 , 4 , 1,1 



1: ‘No expansion’, 2: ’Low expansion’, 3: ’Middle expansion’. 
4: ’Strong expansion’ 



Observation 

o 9 


sequence 


Logarithm 

of 

probability 


Probability of E 9 


A 


BBCDADDBA 


-50.212 


strong 


0.232 


middle 


0.412 


low 


0.061 


null 


0.295 


B 


BBCDADDBB 


-58.453 


strong 


0.129 


middle 


0.850 


low 


0.021 


null 


0.000 


C 


BBCDADDBC 


-52.761 


strong 


0.000 


middle 


0.431 


low 


0.569 


null 


0.000 
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strong 


0.000 


middle 


0.693 


low 


0.307 


null 


0.000 



Figure 7. Interpretation of spatio-temporal event 

Figure 7 shows an example of spatio-temporal analysis 
for multi-date images, the recognition of different events 
(no expansion, low expansion, middle expansion, strong 
expansion) leading to the generation of the output 
sequence. We introduced the following sequence of 
observable states “BBACDAA” which represents the 
evolution of the spatial relationships of 8 synthetic 
images in order to interpret changes this series of 
synthetic images. 

Using the Viterbi algorithm we were able to interpret 
changes in this series of images, where the sequence of 
hidden states is the expansion of area, so we obtained this 
sequence of hidden states “2213411”, thus, each 



predict future changes. We initially detect spatial 
relationships to describe structurally the urban area. We 
modeled the spatio-temporal relations through time with 
Allen's temporal algebra. The variations obtained are 
stored in a database. By introducing these changes we 
were able to interpret changes urban area, and to predict 
future changes. 

Several research axes emerge from this work. We will 
try in future works to improve the performance of this 
methodology and to extend our approach of spatio- 
temporal analysis by adding other indicators of change, 
since in our methodology we used a single source of 
information that is spatial relationships. Other indicators 
can also be used, as factors of change, especially 
demographics data, energy consumption (electricity, gas, 
water,...). In this case, we can use others models such as 



number in this sequence is the expansion of regions at 
time t. 



the Coupled Hidden Markov Model (CHMM), since it 
allows interpretation and the prediction to be performed 



on the basis of two sources of information that are 



processed simultaneously. 
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